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Introduction 

Two  previously  published  observations  were  combined  to  suggest  a  new  physical  method  to 
assess  breast  cancer  risk.  The  first  published  observation  demonstrated  that  optical 
transillumination  spectroscopy  can  provide  information  about  the  molecular  contributions  in 
breast  tissue,  and  the  second  showed  that  parenchymal  density  patterns,  which  are  caused  by 
differences  in  molecular  composition  of  breast  tissue,  have  the  highest  odds  ratio  towards  breast 
cancer  as  a  physical  examination  method.  Thus,  this  study  intended  to  establish  a  correlation  and 
the  strength  thereof,  between  optical  transillumination  spectroscopy  (OTS)  and  parenchymal 
density  patterns,  as  an  intermediary  towards  breast  cancer  risk  in  a  case-control  cross  sectional 
study  on  300  women.  The  study  was  based  on  two  hypotheses:  1)  transillumination  spectroscopy 
of  the  female  breast  correlates  with  the  parenchymal  tissue  density  patterns  as  demonstrated  by 
x-ray  mammography,  and  2)  transillumination  spectra  can  be  understood  quantitatively  in  terms 
of  constituent  tissue  chromophores  and  morphology  through  analytical  modelling  of  the  spectra. 
To  date  282  women  were  recruited  into  the  study  (SARS  related  shortfall  of  the  complete 
intended  recruitment). 

We  established  that  OTS  can  predict  high  versus  low  tissue  density  with  a  sensitivity  and 
specificity  of  >  0.97  each,  thus  effectively  providing  the  same  odds  ratio  towards  breast  cancer  as 
parenchymal  density  patterns,  without  the  use  of  ionizing  radiation  and  the  need  for  a  trained 
radiologist  in  evaluating  the  mammograms. 

Body 


The  previous  approved  statement  of  work  for  this  project  is  listed  below,  as  are  the  outcomes 
throughout  this  study  period  to  date.  It  is  important  to  note  that  the  project  yielded  a  large  body 
of  results,  the  analysis  of  which  will  most  likely  continue  for  the  next  few  months. 

Task  1.  Instrument  improvements  (1-4) 

•  A  spectrophotometer  with  extended  wavelength  range  will  be  constructed  using  a  Si  and  an 
InGaAs  based  CCD  arrays  with  a  bifurcated  fiber  bundle  directing  the  light  from  the  skin  to 
the  two  detectors.  Note  funds  are  requested  only  for  the  InGaAs  detector  and  the  associated 
spectrophotometer.  Additionally  an  excitation  light  source  delivering  ~  500  mW  in  the  550 
nm  to  1.3  um  range  will  be  designed.  (1-4  month) 

•  The  system  will  be  tested  initially  on  6-8  volunteers  to  demonstrate  comfort  and  safe 
operation  (e.g.  no  heating  of  the  skin)  and  to  optimise  the  signal  integration  parameters.  (3-4 
month) 

The  spectrophotometer  and  light  source  were  constructed  and  tested  on  a  limited  number  of 
volunteers.  The  light  source  could  deliver  up  to  400  mW  to  the  tissue  in  the  intended  optical 
window  from  550  nm  to  1.3  pm.  Testing  initially  executed  exclusively  on  Caucasian  volunteers 
(n=6)  with  instructions  to  the  volunteers  to  report  any  sensation  of  warmth  or  heat,  resulted  in  a 
report  of  warmth  from  a  fair  (>50  %)  number  of  volunteers.  Reducing  the  total  delivered  power 
to  approximately  250  mW  spread  over  the  entire  wavelength  band  resulted  in  poor  signal  to 
noise  ratio.  At  this  time  we  decided  to  limit  the  wavelength  band  to  550  nm  to  1.1  pm  at  250 
mW  total  power  on  a  larger  group  of  volunteers  (n=28)  to  determine  the  achievable  signal  to 
noise  ratio  and  the  predictive  power  of  OTS  using  only  the  restricted  wavelength  band.  As  the 
analysis  of  the  2nd  subset  was  encouraging  in  terms  of  predictive  power  (using  PCA,  see  below) 
and  signal  to  noise  ratio  without  any  further  report  of  warmth  in  the  irradiated  tissue,  we 
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continued  with  this  restricted  wavelength  bandwidth.  Additionally,  our  initial  exclusion  of 
women  with  highly  pigmented  skin,  as  in  Latin  American  and  African  American  women,  was  no 
longer  required  and  hence  dropped.  Table  1  shows  the  ethnic  makeup  of  the  study  population  in 
relation  to  the  general  population  of  the  Greater  Toronto  Area.  Figure  1  shows  a  volunteer 
undergoing  OTS.  For  a  more  detailed  description  of  the  instrument  hardware  and  the  execution 
of  the  measurements  please  see  Simick  et.  al.[l]  also  provided  in  Appendix  1.  The  publication 
also  describes  how  the  attenuation  spectra  were  derived  from  the  raw  spectra  using  a  daily 
transillumination  standard  to  correct  for  variations  in  the  wavelength  dependent  transfer  function 
of  the  system. 


Table  1.  Study  ethnic  contributions  of  volunteers  (May  2003) 


Ethnic  Group 

%  Study  population 

%  General  Population 
in  the  Greater  Toronto 
Area 

Caucasian 

88.8 

63.2 

South  Asian 

3.8 

10.2 

South  east  Asian 

0.0 

1.2 

Black 

5.1 

6.7 

Indigenous 

0.3 

0.4 

Others/  including 

Hispanics 

2.0 

18.4 

Figure  1.  Setup  for  transillumination  of  breast  for 
cancer  risk  assessment.  Light  delivery  via  a  liquid 
light  guide  with  the  detector  fiber  visible  at  the 
bottom  of  the  image.  Minimal  pressure  is  applied 
only  to  provide  good  coupling  between  the  optodes 
and  the  tissue. 


Task  2.  Correlation  of  optical  transillumination  spectroscopy  and  parenchymal  tissue  density 
pattern  (4-30  months).  Related  to  specific  aim  1. 

•  Total  of 300  subjects  will  contribute  to  the  spectroscopy  database,  stratified  into  6  groups 
(low,  medium  and  high  parenchymal  density  pattern  for  pre-  and  post-menopausal  women 
respectively)  (5  -30  months). 

•  A  subgroup  of  the  subjects  (60)  will  be  asked  to  participate  in  the  low  resolution  sector  scans 
to  analyse  the  variability  of  transillumination  patterns  with  local  changes  in  the  parenchymal 
density  pattern  (6-24  months). 
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•  Initial  set-up  of  PC  A  model  for  extracting  spectral  contributions  and  ranges  of  interest  (4-  8 
months). 

•  Establishing  initial  model  identifying  wavelength  ranges  which  show  possible  correlation 
with  tissue  density  (12  months  and  24  months). 

Just  prior  to  the  outbreak  of  SARS  in  various  Toronto  Hospitals  (March  2003)  and  the  associated 
restrictions  of  allowing  volunteers  into  any  hospital,  including  the  institute  where  this  study  was 
conducted,  recruitment  stood  at  282  volunteers.  (We  are  currently  completing  the  recruiting  with 
all  13  volunteers  scheduled  by  August  15th).  Global  analysis,  based  on  PCA  is  available  based 
on  all  282  volunteers;  however,  some  specific  tests  are  completed  only  on  a  subset  of  156 
volunteers,  with  some  analysis  methods  still  pending.  Table  2  shows  the  recruitment  breakdown 
based  on  tissue  density  for  those  volunteers  whose  classification  is  available,  along  with  a 
comparison  to  the  population  proportions  based  on  the  Canadian  National  Breast  Cancer 
Screening  Survey  [2].  As  the  study  and  the  population  proportions  are  similar,  all  analyses  are 
presented  with  the  assumption  that  the  results  are  actually  those  expected  for  the  general 
population.  This  is  a  deviation  from  the  initial  statement  of  work,  but  we  feel  that  the  conclusions 
to  be  drawn  from  this  study  are  now  stronger.  Table  2  also  gives  the  total  number  of  spectra 
available  for  the  most  recent  PCA  analysis.  As  a  first  approach  we  did  not  introduce 
stratifications  either  based  on  menopausal  status,  week  during  the  monthly  cycle  (if  applicable), 
ethnic  background,  age,  body  mass  index  (BMI),  parity,  or  measurement  position. 


Table  2.  Breakdown  of  study  volunteers  included  in  most  analyses  to  date:  including  study  and  population 


Density 

Category 

Training 

Set 

Validation 

Set 

Total 

Study 

Proportion 

(%) 

Population 

Proportion 

(%) 

Low 

80  (640) 

26  (208) 

106  (848) 

37.6 

37 

Medium 

103  (824) 

34  (272) 

137  (1096) 

48.6 

49 

High 

30  (240) 

9(72) 

39  (312) 

13.8 

14 

Totals 

213(1704) 

69  (552) 

282  (2256) 

proportions  (numbers  in  parentheses  refer  to  the  total  number  of  optical  spectra  available  to  date). 


Performance  of  a  low  resolution  sector  scan  of  the  breast  was  limited  to  only  4  volunteers.  This 
subtask  was  not  further  pursued  since,  firstly,  the  actual  volume  optically  interrogated  can 
comprise  up  to  25  cc  for  a  5  cm  interoptode  distance,  thus  small  variations  in  the  tissue  density 
between  sectors  are  hardly  noticeable.  Secondly,  as  boundary  losses  of  photons  at  the  edge  of  the 
breast  can  introduce  variability  comparable  to  that  seen  due  to  density  changes  in  these  rather 

large  volumes,  changes  in  the  spectra  are  difficult 
to  attribute  to  either  changes  in  the  boundary 
losses  or  density  changes.  Figure  2  illustrates  the 
changes  in  the  transmission  spectra  of  a  volunteer 
undergoing  a  ID  sector  scan  from  the  center  of 
the  breast  to  the  medial  edge. 


Figure  2.  Changes  in  transillumination  spectra  during  a  ID 
sector  scan  in  a  Caucasian  volunteer,  starting  at  the  centre 
and  moving  towards  the  medial  edge  of  the  breast.  ‘Left 
medial’  is  about  2  cm  from  the  edge  and  represents  the 
normal  measurement  position,  ‘left  medial  2’  is  1.5  cm  and 


-center 
left  medial  2 


-left  medial 
-left  medial  3 


‘left  medial  3’  is  1  cm  from  the  edge.  No  significant  spectral  changes  are  seen  for  the  1  cm  motion  of  the  optode 
pair. 

Figure  3  shows  examples  of  thickness  and  transfer  function  corrected  transillumination  spectra 
for  a)  women  with  low  and  b)  women  with  high  tissue  density  according  to  the  radiologist. 

PCA  training  on  the  entire  data  set  resulted  in  the  4  principle  components  depicted  in  figure  4. 
Table  3,  column  titled  ‘All  positions’,  gives  the  variance  of  the  entire  data  set  captured  by  these  4 
principle  components.  As  components  2  to  4  capture  only  a  small  amount  of  the  variance,  their 
order  sometimes  changes  when  analysing  various  subgroups,  as  shown  below. 

Table  3.  Variance  [%]  accounted  for  by  each  principle  component  for  all  positions  and  for  each  measurement 
position. 


Principle 

Component 

All  positions 

Center 

Medial 

Distal 

Lateral 

Pi 

99.86 

99.85 

99.89 

99.87 

99.88 

P2 

0.07 

0.09 

0.06 

0.07 

0.06 

Ps 

0.05 

0.04 

0.03 

0.04 

0.04 

_ P< 

0.01 

0.01 

0.01 

0.01 

0.01 

Figure  3.  Representative 
examples  of  thickness  and 
transfer  function  corrected 
transillumination  spectra  from 
women  with  either  a)  low  or  b) 
high  tissue  density. 


PCI 

PC2 

PC3 

PC4 


Figure  4.  Resulting  principle 
component  spectra  from  training 
of  non-stratified  data  (n=1704). 


As  described  in  Appendix  1 
and  2  the  principle 
component  spectra  carry 
information  about  tissue 
chromophores  such  as  lipids 


Wavelengths 
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(925  nm),  water  (970nm)  oxy  haemoglobin  (—750  nm)  as  well  as  deoxy  haemoglobin,  melanin 
and  the  overall  light  scattering. 

Principle  component  analysis  thus  decomposes  any  given  transillumination  spectrum  into 
component  scores  denoted  4  Plotting  these  component  scores,  4  against  one  another  in  either  a 
2D  or  a  3D  space,  permits  identification  of  related  spectra  exhibiting  similar  traits. 

Figure  5  shows  a  2D  plot  of  the  data  where  clusters  are  defined  by  ellipsoids  giving  iso¬ 
probability  lines  of  finding  members  of  a  certain  class  (either  high  or  low  tissue  density)  within 
them.  To  calculate  the  sensitivity  and  specificity  of  OTS  to  predict  high  tissue  density,  two 
analytical  means  of  cluster  separation  were  investigated.  First,  clusters  are  analytically  described 
by  a  line  or  plane  of  best  fit  and  the  resulting  median  plane  between  them  is  calculated.  Second, 
based  on  the  iso-propability  ellipsoids  a  tangent  line  or  plane  is  determined  which  identifies  the 
iso-propability  ellipsoids  of  the  two  clusters  that  are  just  touching.  It  is  worth  noting  that  the 
lengths  of  the  half  axes  of  the  ellipsoids  are  calculated  using  the  frequency  histograms  of  the  data 
points  in  these  cluster  plots  along  each  ellipsoid  axis. 


2D  -  Histogram  Inward  Distribution 


2D  -  Histogram  Inward  Distribution 


Figure  5.Two-dimensional  cluster  plots  of  a)  tj  versus  t2  and  b)  tj  versus  t3  resulting  from  thickness  and  transfer 
corrected  spectra  of  high  and  low  tissue  density.  Only  training  data  set  shown  (n  =  1704).  Blue  circles,  spectra  from 
tissue  classified  as  low  density;  red  circles,  spectra  from  tissue  classified  as  high  density.  Median  axis  (green)  and 
new  tangent  axis  (black)  are  shown. 


Figure  61eft).  Three-dimensional  cluster  plot  of  tlf  versus  t3  and  t4  resulting  from  thickness  and  transfer  corrected  spectra  of  high 
and  low  tissue  density.  Only  training  data  set  shown  (n  =  1706).  Dark  points,  data  from  low  density  tissue;  light  points,  spectra 
from  high  density  tissue  .  New  tangent  plane  (vertical)  and  median  plane  (horizontal)  are  shown.  Figure  6  right)  shows  the 
resulting  ellipsoids  indicating  isopropability  lines  and  only  the  tangent  plane  of  separation  . 
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Furthermore,  it  should  be  noted  that  the  density  cluster  of  the  low  tissue  density  group  is 
significantly  larger  than  that  of  the  high  tissue  density  group.  Figure  6  shows  an  example  of  a  3D 
plot  with  the  respective  median  plane  and  the  tangent  plane  indicated. 

We  calculated  a  high  density  measure  (HDM)  and  a  low  density  measure  (LDM),  defined  as  the 
number  of  tissue  volumes  correctly  identified  as  either  high  or  low  tissue  density,  respectively  by 
OTS  divided  by  the  total  number  of  tissue  volumes  identified  as  either  high  or  low  tissue  density, 
respectively  by  the  radiologist.  Hence,  HDM  represents  sensitivity  and  LDM  specificity  for  the 
detection  of  breast  tissue  with  high  tissue  density.  Table  4  presents  the  HDM  and  the  LDM  based 
on  the  two  cluster  separation  schemes  for  PCA  trained  on  measurements  from  ‘All  Positions’. 
The  fact  that  LDM  is  lower  than  the  HDM  can  be  attributed  to  the  fact  that  low  tissue  density  is 
defined  as  ‘less  than  25%  of  the  mammographic  area  is  covered  by  densities’,  thus  an  individual 
spectrum  from  one  quadrant  can  also  be  representative  of  high  tissue  density  at  this  position. 


Table  4.  HDM  and  LDM  for  training  and  validation  sets  using  all  component  scores  or  scores  for  individual 
positions  as  derived  from  3D  cluster  plots.  HDM  and  LDM  results  using  either  the  median  plane  or  the  tangent  plane 
are  presented. 


Component 

Scores(x,y,z) 

Tangent  Plane 

Training  set  Validation  Set 

HDM  LDM  HDM  LDM 

Median  Plane 

Training  set  Validation  Set 

HDM  LDM  HDM  LDM 

All  Positions 

t4 

88.3 

85.8 

98.6 

88.9 

68.3 

63.0 

76.4 

78.9 

Center 

tut3i  U 

96.7 

84.4 

94.4 

92.3 

71.7 

55.6 

83.3 

36.5 

Medial 

tuts,  t4 

90.0 

85.0 

94.4 

92.3 

20.0 

56.3 

88.9 

53.9 

Distal 

tut 2,  ts 

96.7 

85.0 

100.0 

94.2 

85.0 

88.1 

77.8 

90.4 

Lateral 

tut 2,  t4 

96.7 

94.4 

100.0 

92.3 

91.7 

73.8 

100.0 

90.4 

It  should  be  stressed  that  by  using  the  tangent  plane  very  high  HDM  and  LDM  can  be  achieved 
without  any  stratification.  Additionally,  while  the  spectroscopic  technique  is  sensitive  only  to 
average  volumetric  properties,  the  volumes  sampled  per  spectra  are  less  than  25%  of  the  breast 
tissue,  whereas  the  parenchymal  density  is  a  global  measure.  Hence,  averaging  the  scores  from 
all  8  spectra  collected  per  volunteer  was  tested  for  possible  improvement  of  the  HDM  and  LDM. 
Results  based  on  the  median  plane  of  separation  for  2  different  representations  of  the  3D  data  are 
presented  in  table  5,  indicating  an  improvement  in  each  case.  Figure  7  shows  the  improvement  in 
graphical  form. 


Table  5.  HDM  and  LDM  for  test  and  validation  sets  using  either  scores  from  all  8  spectra  per  volunteer  or  the  mean 
score  per  individual. 


Equation  Used 

Training  Set 

HDM  LDM 

Validation  Set 

HDM  LDM 

All  scores 

tjijh) 

76.9% 

69.4% 

88.3% 

87.5% 

75.0% 

64.3% 

96.7% 

92.5% 

Mean  scores 

85.0% 

85.0% 

89.1% 

82.6% 

85.7% 

71.4% 

100.0% 

93.3% 

Thus,  we  have  established  that  OTS  is  a  valid  physical  assessment  technique  for  tissue  densities. 
As  the  HDM  and  the  LDM  are  close  to  0.9  OTS  will  provide  a  similar  odds  ratio  toward  breast 
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cancer  risk  as  parenchymal  density  pattern.  This  is  achieved  without  stratification  of  the 
volunteer  population  based  on  the  above  mentioned  demographic  risk  factors,  which  are  often 
employed  in  risk  models  such  as  the  Gail  Risk  Model  [3].  Additionally,  the  effects  of  hormonal 
variations  resulting  from  the  menstrual  cycle  on  the  transillumination  spectra  were  not 
considered.  For  a  more  detailed  statistical  analysis  based  on  n=156  volunteers  please  see 
Appendix  3,  a  paper  submitted  to  Cancer  Research  for  peer  review  July  21st  2003. 


Figure  7.  Three- 
dimensional  cluster 
plots  of  a)  t2  ,  versus 
/7and  t3  (n  =  528)  and 

b)  versus  f;and 

t3  (n  =  88).  Shown 

are  scores  from  spectra 
of  high  (solid  symbols) 
and  low  (open 
symbols)  dense  tissues 
after  thickness  and 
transfer  function 
correction. 


Effects  of  menstrual  cycle 

Changes  in  the  PCA  scores  measured  in  20  women  resulting  from  transillumination  spectra 
collected  4  times  during  a  menstrual  cycle  are  displayed  for  the  left  centre  position  in  figure  8. 
Plots  of  the  other  measurement  positions  showed  similar  results.  From  this  we  concluded  that 
timing  of  OTS  in  pre-menopausal  women  and  women  on  HRT  to  a  certain  period  during  the 
menstrual  cycle  is  not  required. 


♦  Low  Week  1  ■  Low  Week  2  A  Low  Week  3  ®  Low  Week  4 

O  High  Week  1  □  High  Week  2  A  High  Week  3  O  High  Week  4 


Figure  8.  Scores  for  t2 
versus  tj  for  n=20  women 
presenting  4  times  during  a 
menstrual  cycle.  Shown  are 
results  for  the  left  center 
measurement  position  only. 
Low  and  high  refer  to  low 
and  high  tissue  density, 
respectively.  Using  a  PCA 
model  trained  on  n=88 
volunteers. 


10 


Effects  of  age 

It  was  previously  shown  [5]  that  transillumination  spectra  collected  from  breast  tissue  will 
change  as  a  function  of  age  due  to  the  atrophy  of  glandular  tissue  prior,  during  and  post 
menopause.  On  a  subgroup  of  n=88  volunteers  we  analysed  the  PCA  scores  as  function  of  age 
following  age  independent  training  of  the  PCA  components.  Figure  9  shows  the  results  for  the 
individual  means  of  the  first  2  component  scores  for  high  and  low  tissue  densities.  While 
component  score  1  shows  a  slight  increase  with  age  for  low  and  high  tissue  densities,  component 
score  2  appears  to  be  independent  of  age.  The  first  result  is  anticipated,  the  2nd  somewhat 
surprising. 

As  shown  below  under  work  performed  related  to  Task  3,  component  score  1  is  thought  to  be 
inversely  related  to  the  overall  light  scattering  in  breast  tissue,  and  atrophy  during  menopause 
will  replace  highly  light  scattering  glandular  tissue  with  less  scattering  adipose  tissue.  The  fact 
that  the  slope  for  high  tissue  density  is  steeper  than  that  for  low  tissue  density  is  also  anticipated 
as  more  glandular  tissue  can  undergo  atrophy.  Principle  component  2  shows  an  inverse  lipid 
(positive)  and  water  absorption  (negative)  peak  (see  figure  4),  and  thus  it  is  anticipated  that 
replacement  of  glandular  by  adipose  tissue  should  result  in  a  positive  slope  specifically  around 
age  50  when  atrophy  is  fastest.  It  should  be  noted  that  specifically  for  high  tissue  densities  the 
regression  analysis  for  the  age  dependence  of  the  component  scores  resulted  in  low  correlation 
coefficients  and  thus  the  dependency  results  should  not  be  over  interpreted. 
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Figure  9.  Scatter  plots  of  averaged  component  scores;  left  tj  and  right  t2  per  individual  as  a  function  of  age  for 

high  and  low  density  categories  (n  =  88).  Open  circles  and  solid  regression  line  represent  low  tissue  density,  closed 
circles  and  dashed  line  represent  high  tissue  density. 

To  further  investigate  if  stratification  of  the  volunteers  based  on  age  is  warranted,  we  trained 
PCA  models  independently  for  volunteers  younger  than  50  years  of  age  and  older  than  55.  The 
results  showed  that  the  derived  principle  components  are  comparable  between  the  two  age  ranges 
in  shape  and  magnitude,  with  the  exception  that  the  water  contribution  in  principle  component  2 
and  the  lipid  contribution  in  principle  component  3  are  lower  for  the  less  than  50  year  old 
women.  A  Kruscal-Wallis  analysis  of  the  component  scores  between  the  two  groups  revealed  no 
statistically  significant  differences  with  the  exception  of  effects  of  Body  Mass  Index  (BMI) 


11 


Similarly  we  investigated  if  the  PCA  derived  component  scores  showed  a  correlation  with  BMI 
and  the  results  are  displayed  in  figure  1 0  below.  Here,  as  anticipated,  correlations  for  the  first  2 
principle  components  are  identified  for  both  density  classes,  with  apparently  reduced  light 
scattering  for  women  with  higher  body  mass  index  by  tj,  and  increased  lipid  content  in  the  breast 
tissue  by  t2.  These  results  can  be  seen  as  additional  evidence  towards  the  interpretation  of  the 
principle  component  spectra. 


Figure  10.  Scatter  plots  of  averaged  component  scores  a)  tj  and  b)  t2  per  individual  as  a  function  of  BMI  for  high 

and  low  density  categories  (n  =  88).  Open  circles  and  solid  regression  line  represent  low  tissue  density;  closed 
circles  and  dashed  line  represent  high  tissue  density. 


Effects  of  measurement  position 


A  multivariate  analysis  of  the  PCA  results  showed  significant  differences  in  the  component 
scores  between  the  4  interrogated  quadrants.  Figure  1 1  shows  the  mean  and  standard  deviation 
of  the  first  three  component  scores  U  as  a  function  of  the  measurement  position  for  high  and  low 
tissue  density.  Subsequently,  the  PCA  analysis  was  executed  for  each  individual  position.  While 
the  resulting  component  vectors  were  similar,  see  appendix  3  for  more  details,  the  HDM  and 
LDM  significantly  improved  (see  Table  4).  The  HDM  and  LDM  in  the  validation  sets  are  all 
>0.92  further  improving  the  density  prediction  value  of  OTS.  From  figure  xxx  it  also  becomes 
apparent  that  the  scores  for  the  low  tissue  densities  cluster  tighter  than  those  for  the  high  tissue 
densities,  possibly  indicating  that  different  anatomical  structures  contribute  to  the  appearance  of 
parenchymal  density  pattern. 
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Measurement  Position  Measurement  Position  Measurement  Position 


Figure  11.  Mean  component  scores,  left)  t  j ,  middle)  t2  and  right)  t3  for  the  four  left  measurement  positions 

(centre  =  LC,  medial  =  LM,  distal  =  LD,  lateral  =  LL)  for  low  (open  circles)  and  high  (closed  circles)  density  tissue. 
Error  bars  represent  95%  confidence  intervals  of  the  mean.  Based  on  n  =  88  volunteers 

Pooling  of  results  by  quadrants. 

As  previously  indicated,  OTS  samples  the  optical  behaviour  of  the  breast  tissue  volumetrically. 
Hence  each  OTS  spectrum  by  itself,  under  samples  the  breast,  thus  possibly  providing  a  wrong 
assessment  relative  to  the  density  classification,  which  is  inherently  a  global  tissue  attribute. 
Figure  12  shows  a  frequency  histogram  indicating  the  number  of  spectra  per  individual 
predicting  a  density,  here  high  or  low.  If  the  requirement  is  that  3  of  the  4  spectra  collected  in  the 
center  and  distal  positions  on  the  bilateral  organ  must  predict  high  density  to  assign  high  density 
globally,  the  HDM  and  LDM  is  >  0.97.  The  predictive  values  achieved  for  a  similar  condition 
using  the  lateral  and  medial  positions  are  lower.  The  fact  that  the  combination  comprised  of  the 
center  and  distal  positions  carries  a  higher  predictive  value  is  not  surprising  since  both  quadrants 
exhibit  densities  less  likely  seen  in  the  other  two  quadrants,  previously  also  shown  by  Wolfe  et 
al.  [5,6]. 


Nox>ttp«cwtl3tp«dfcta  igt  a ««  n>  of»p»rtratt3tpr*«m  tgt 


Figure  12.  Frequency  histogram  of  the  number  of  spectra  from  left)  the  center  and  distal  positions  and  right)  the 
medial  and  lateral  positions,  that  correctly  predicted  high  tissue  density.  High  tissue  density  is  shown  in  black  and 
low  tissue  density  is  shown  in  grey.  Results  are  based  on  n=88  volunteers. 
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Task  3.  Derivation  of  tissue  chromophore  concentrations  and  light  scattering  properties  of  the 
breast  tissue.  (4-34  months).  Related  to  specific  aim  2. 

•  Setting  up  of  P LS  analysis  to  extract  tissue  chromophores  from  transillumination  spectra  by 
creating  a  look  up  table  for  light  transmission  as  a  function  of  tissue  optical  properties  (4-6 
months). 

•  Total  of  60  subjects  will  contribute  to  the  frequency  domain  spectroscopy  database  to  derive 
the  light  scattering  properties  as  a  function  of  the  hormonal  status  (6-18  months). 

•  Determination  of  chromophore  concentrations  for  all  collected  spectra  of  the  subjects  (12-36 
months). 

•  Establishing  initial  model  identifying  chromophore  concentration  ranges/ratios,  which  show 
possible  correlation  with  tissue  density  (20  months  and  28  months). 


To  execute  the  correlation  between  the  mammographic  based  tissue  density  on  an  interval  scale 
and  OTS,  training  of  the  research  staff  in  a  computer  assisted  analysis  program  (Cumulus), 
developed  by  the  group  of  Dr.  Yaffe  [7],  was  required.  Only  recently  have  we  achieved  a 
‘Cumulus’  training  level  deemed  sufficient  for  implementation  of  the  PLS  analysis.  The  training 
level  required  includes  that  persons  executing  the  %  density  assessment  from  mammograms 
achieve  a  >0.85  correlation  for  repeat  assessments  of  the  same  mammograms.  An  additional 
complication  that  occurred  was  that  the  breast  imaging  center  used  for  patient  recruitment 
switched  last  year  to  exclusive  use  of  digital  mammograms  and  the  computer  assisted  program 
does  not  have  the  data  file  filters  for  the  .dicom  file  format  used  in  digital  mammography.  We 

are  currently  waiting  on  our  collaborator 
Dr.  Yaffe  to  complete  these  filters  (ETA 
August  2003).  Some  results  are  shown 
below  to  demonstrate  the  ability  of  OTS 
to  predict  tissue  density  also  on  a  nominal 
scale.  To  date  the  data  correlation  is  not 
entirely  satisfying  as  the  repeat  correlation 
for  the  staff  extracting  the  %  density  from 
mammograms  was  only  ~  0.8,  thus 
limiting  the  accuracy  of  the  PLS  training. 
Figure  13  shows  an  example  of  a  PLS 
derived  prediction  of  the  tissue  densities 
for  a  person  achieving  a  correlation 
coefficient  ~0.8  versus  one  achieving  only 
a  correlation  coefficient  of  0.72,  indicating 
the  need  for  accurate  and  repeatable  use  of 
the  ‘Cumulus’  program. 


Figure  13.  Partial  Least  Squares  Regression 
Actual  vs.  Predicted  Plots  for  top)  fair  trained 
reader  and  bottom)  poorly  trained  reader.  Solid 
symbols  are  for  data  points  from  the  training  set 
and  open  symbols  for  those  from  the  validation  set. 
Confidence  intervals  for  training  set  -  light, 
validation  set  -  dark 


♦framing  set  ❖  validation  set 
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From  figure  1 3  it  becomes  evident  that  the  correlation  for  density  predicted  based  on  OTS  versus 
that  measured  by  mammography  is  mostly  limited  by  the  accuracy  of  the  latter.  Principle 
component  regression  was  investigated  as  an  alternative  to  PLS  but  did  not  result  in  an 
improvement  of  the  correlation  between  densities  by  OTS  versus  mammography.  New  PLS 
results  will  become  available  only  in  mid  August.  We  will  also  attempt  to  use  the  Gail  Score,  a 
demographics  based  risk  predictor,  as  the  standard  for  OTS  analysis.  Comparing  the  component 
vector  between  a  physical  property  based  standard  and  a  demographics  based  standard  may 
provide  interesting  information  about  common  elements  between  the  two. 


The  frequency  domain  system  assembled  comprised  three  emission  wavelengths  (785,  808  and 
905  nm)  operated  at  150MHz  and  a  heterodyne  detection  at  150.0002  MHz.  The  basic  setup  is 
described  by  Patterson  etal.  [8, 9],  A  total  of  15  women  were  recruited  for  the  frequency  domain 
measurements.  (Recruitment  fell  short  due  to  the  outbreak  of  SARS  in  Toronto,  and  as  some  of 
the  equipment  components  were  only  on  loan,  it  is  difficult  to  reactivate  these  experiments  now 
as  the  SARS  situation  has  improved).  The  phase  shift  was  shown  to  be  associated  with  the 
differential  pathlength  factor  [10],  which  in  turn  represents  the  scattering  power  of  the  tissue.  To 
determine  the  phase  shift  due  to  the  tissue  versus  the  phase  shift  due  to  the  optical  path  and  the 
electronic  components,  measurements  in  distilled  water  were  collected  with  the  same  interoptode 
distance  and  that  phase  shift  subtracted  from  the  one  measured  within  the  tissue. 

Our  hypothesis  was  that  the  principle  component  1  from  the  PCA  analysis  above  represents  light 
losses  due  to  increased  pathlength  and  losses  at  the  tissue  boundary.  Figure  14  shows  a  plot  of 
the  phase  shift  at  three  wavelengths  (785,  808  and  905  nm)  versus  component  scores  tj  and  for 
this  subgroup  of  women.  As  can  be  seen  there  is  a  correlation  between  the  phase  shift  versus  PCI 
but  only  little  as  function  of  PC2.  Additionally,  high  phase  shift  and  low  tj  are  indicative  of  an 
increased  optical  pathlength  thus  the  correlation  is  as  expected.  It  is  surprising  that  the  density 
groups  do  not  cluster  separately  in  the  top  row  of  graphs,  that  is  that  the  optical  pathlength  does 
not  depend  on  the  radiologically  derived  density  groups.  This  indicates  that  photon  scattering  by 
itself  does  not  differentiate  between  density  groups  using  visible  and  NIR  radiation,  whereas 
Compton  scattering  of  x-rays  contributes  to  the  majority  of  the  contrast  for  risk  assessment  based 
on  parenchymal  density  pattern.  This  in  turn  indicates  that  OTS  holds  also  complimentary 
information  to  ionizing  radiation  based  risk  assessment. 

A  different  way  of  looking  at  these  results  is  through  the  calculation  of  the  differential  pathlength 
factor  (DPF)  against  propagation  of  the  photon  density  waves  as  a  function  of  tissue  density  and 
wavelength.  It  is  noteworthy  that  the  DPF  is  a  function  of  the  scattering  and  attenuation 
coefficient  and  both  are  changing  as  a  function  of  wavelength  and  tissue  density  classification. 
Figure  15  shows  similar  results  to  figure  14  using  DPF  as  a  function  of  component  scores  tj  and 
h ■  For  the  measured  population  the  averages  as  a  function  of  wavelength  are  shown.  Considering 
that  at  908  nm  the  influence  of  the  density  determining  chromophore  lipid  is  low,  with  possibly  a 
minor  effect  of  oxyhemoglobin,  see  figure  4,  the  noted  difference  in  DPF  for  the  medium  and 
high  parechnymal  tissue  density  groups  must  be  attributed  to  an  increase  in  scattering.  Accepting 
this  line  of  reasoning,  the  fact  that  at  808nm  no  difference  in  DPF  is  noted,  indicates  that  a  strong 
increase  in  total  hemoglobin  absorption  (805  nm  is  the  isobestic  point  for  oxy  and  de-oxy 
hemoglobin)  is  present  for  the  medium  and  high  tissue  density  classes.  Similar  reasoning  leads  to 
a  preferential  increase  in  oxyhemoglobin  and  thus  higher  oxygen  saturation  as  the  scattering 
increase  is  less  offset  at  785  nm  where  deoxyhemoglobin  is  the  stronger  heme  based  absorber. 
Figure  17  shows  the  change  in  DPF  as  a  function  of  wavelength  in  a  group  of  representative 
individuals  of  the  three  tissue  density  classes.  It  is  unclear  at  this  time  if  the  individual  DPF 
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measurements  can  be  exploited  for  risk  assessment  similar  to  OTS  opening  a  2nd  technological 
avenue  for  optical  breast  cancer  risk  assessment. 


Figure  14.  Phase  shift  as  a  function  of  component  scores  top)  t,  and  bottom)  t2  at  three  wavelengths  (left  to  right 
785,  808  and  905  nm).  Correlation  is  strong  for  component  scores  t,  but  weak  for  component  t2  . 


Figure  15.  Differential  pathlength  factor  versus  component  scores  top)  t,  and  bottom)  t2  for  the  three  tissue  densities 
(left  to  right  low,  medium  and  high)  at  the  three  investigated  wavelength 
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Figure  16.  DPF  as  a  function  of  wavelength  for  the  three  tissue  classes  with  a  qualitative  explanation  for  the  possible 
causes. 


Figure  1 7 .  DPF  as  a  function  of  wavelength  in  different  individuals  representing  the  three  tissue  classes  (low, 
medium  and  high,  left  to  right) 


•  Determination  of  chromophore  concentrations  for  all  collected  spectra  of  the  subjects 
(12-36  months) 

The  program  used  to  determine  the  tissue  chromophore  concentrations  is  based  on  the  diffusion 
theory  as  an  approximate  solution  to  the  transport  equation  with  mismatched  boundary 

conditions  (e.g.  change  in  refractive  index)  to  correlate  the  fluence  <f>(d)  =  jl(r,s)d(o  as  a 

An 

function  of  distance,  d,  from  the  light  source,  Beer’s  Law  for  attenuation  of  light 

1/f  =  e~M°d  where  I  can  be  determined  based  on  diffusion  theory,  and  the  fact  that  several 

n 

known  chromophores  contribute  to  the  attenuation  coefficient  pa according  to  pA  =^lCijuA  and 

l«= 1 

all  are  a  function  of  wavelength.  In  order  to  model  the  wavelength  dependent  light  scattering  the 
function  jus  =  aX~b  according  to  Pogue  et.  al.  [1 1]  is  employed.  To  simplify  the  entire  equation 

set  we  elected  to  set  the  fluence  beyond  the  boundary  to  zero,  which  was  experimentally  realized 
by  black  absorbing  surfaces  around  the  source  and  detectors.  Ultimately,  one  derives  an  equation 

4e~^rd 

system  containing  <j>(d )  = - 

1  +  2  Pa!  Ms 


with  Heff  -  3  M  a'  ( Ma  +  Ms )  and  the  radiance  into  two 
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hemispheres  given  by  Ffyd)  =  e~Mejrd  and  F_(d)  =  qe~^d  where  q  =  — %  —  .  To  solve 

Mejj  +  2  Fa 

this  equation  system  the  Nelder-Mead  Simplex  Method  a  multidimensional  minimization 
procedure  is  used  within  MatLab,  considering  oxyhaemoglobin,  de-oxyhaemoglobin,  lipid, 
water  and  melanin  to  date.  Figure  18  provides  3  examples  of  fitting  between  the  experimental 
and  the  theoretical  spectrum,  including  the  concentrations  of  the  detected  chromophores.  It  is 
noteworthy  that  the  haemoglobin  absorption  peak  at  ~750  nm  in  only  poorly  captured  and  the 
overall  haemoglobin  content  of  the  tissue  is  estimated  too  low.  We  are  currently  re-examining 
the  code  we  wrote  in  order  to  identify  a  possible  bug. 


Figure  18.  Examples  of  the  decomposition  of  the  optical  transillumination  spectra  into  chromophore  concentrations. 
The  corresponding  chromophore  concentrations  are  given  respectively  in  Table  6  below. 

Table  6.  Chromophore  concentrations  and  scattering  power  of  preliminary  spectral 
deconvolution. 


Patient 

Position 

449025865 

LC 

224903682 

LC 

802723643 

LC 

Water 

% 

67.6 

46.1 

48.7 

Lipids 

% 

32.37 

38.08 

26.9 

Hemoglobin 

mol/L 

1.89  e-10 

1.017  e-12 

1.351  e-12 

Oxy-Hemoglobin 

mol/L 

2.78  e-5 

3.75  e-5 

8.634  e-5 

Melanin 

mol/L 

0.000159 

0. 000178 

0.00047 

Scattering 

amplitude 

cm-1 

8900 

8760 

4790 

Scattering  Power 

1.081 

1.047 

1.0193 

error  value 

2.4956 

3.1444 

3.046 

•  Establishing  initial  model  identifying  chromophore  concentration  ranges/ratios  which 
show  possible  correlation  with  tissue  density  (20  months  and  28  months). 

As  discussed  above  we  are  still  in  a  debugging  mode  related  to  the  program  to  be  used  and 
this  task  is  still  pending. 
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Pending  data  analysis  (anticipated  completion  1-6  Q4  2003, 7, 8  Q1  2005) 

The  following  is  a  list  of  planned  analyses  of  the  present  data  set,  including  some  of  the  pending 
tasks  proposed  during  the  initial  funding  period  and  some  extending  beyond  those  proposed. 

1 .  PCA  of  OTS  versus  tissue  density  based  on  the  classification  of  each  quadrant. 

2.  PLS  of  OTS  versus  %  tissue  density  based  on  each  quadrant. 

3.  PLS  of  OTS  versus  risk  score  based  on  Gail  Model  (n=162). 

4.  Tissue  chromophore  extractions  from  OTS. 

5.  PLS  of  tissue  chromophore  extraction  versus  tissue  density  based  on  the  classification  of 
each  quadrant. 

6.  PLS  for  same  data  as  point  5  but  versus  %  tissue  density  based  on  each  quadrant. 

7.  Detailed  analysis  of  outliers  from  analyses  1  to  6. 

8.  Analysis  if  wavelength  dependent  DFP  provides  diagnostic  risk  assessment. 

Key  Research  Accomplishments 

•  Demonstrated  the  feasibility  of  OTS  as  a  method  to  determine  breast  cancer  risk. 

•  Establishment  of  analysis  methods  resulting  in  a  sensitivity  and  specificity  to  identify 
women  with  high  versus  low  tissue  density,  larger  than  0.97  each. 

•  Through  the  establishment  of  defacto  equivalence  between  OTS  and  mammographic 
density  pattern,  OTS  will  provide  at  least  the  same  odds  ratio  or  relative  risk  as  tissue 
density  in  identifying  women  with  high  breast  cancer  risk. 

•  Demonstration  that  OTS  can  provide  anatomical  and  physiological  information  about  the 
breast  at  risk,  by  providing  tissue  density  and  tissue  chromophore  information. 

Reportable  Outcomes 

Manuscripts  submitted  for  peer  review  (two  attached) 

Non  Ionizing  Near  Infrared  Radiation  Transillumination  Spectroscopy  for  Breast  Tissue  Density 
and  Breast  Cancer  Risk  Assessment,  by  Michelle  K.  Simick,  Roberta  Jong,  Brian  C.  Wilson  and 
Lothar  Lilge,  submitted  to  Journal  of  Biomedical  Optics,  May  2003. 

Classification  of  breast  tissue  density  by  Optical  Transillumination  Spectroscopy:  optical  and 
physiological  effects  governing  predictive  value,  by  Kristina  Blyschak,  Michelle  Simick, 

Roberta  Jong  and  Lothar  Lilge,  submitted  to  Cancer  Research,  July  2003. 

Manuscripts  in  preparation  for  peer  review 

1 .  Manuscript  detailing  the  analysis  based  on  density  given  on  an  interval  scale. 
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2.  Manuscript  detailing  the  extraction  of  the  chromophore  concentrations. 

3.  Manuscript  comparing  analysis  on  an  interval  scale  for  a  physical  risk  standard  (density) 
versus  a  demographics  based  standard  (Gail  Score). 

M.Sc.  Thesis 

Near  Infrared  Transillumination  Spectroscopy  of  Breast  Tissue  for  Correlation  with 
Marmnographic  Density,  by  Michelle  K.  Simick.  A  thesis  submitted  in  conformity  with  the 
requirements  for  the  degree  of  the  Masters  of  Science,  Graduate  Department  of  Medical 
Biophysics,  University  of  Toronto,  September  2002. 


Presentations  invited 


Optical  Transillumination  Spectroscopy  of  Breast  Tissue:  Correlation  to  Parenchymal  Density 
Patterns  and  Cancer  Risk,  Lothar  Lilge,  Wellman  Laboratories  of  Photomedicine,  MGH,  Boston, 
USA,  November  2001 . 

Optical  Transillumination  Spectroscopy  of  Breast  Tissue:  Non-Imaging  pre-screening  for  women 
of  all  ages,  Lothar  Lilge,  Institute  for  Laser  Medicine,  Ulm,  Germany,  December  2002. 

Optical  Transillumination  Spectroscopy,  better  prescreening  of  women?  Lothar  Lilge,  Kristina 
Blyschak,  Breast  Club,  Princess  Margaret  Hospital,  February  2003. 

Colours  tell  thy  risk!  Can  spectroscopy  be  used  in  preventive  oncology?  Lothar  Lilge,  Kristina 
Blyschak,  Michelle  Simick,  Roberta  Jong,  Brian  C.  Wilson,  Engineering  Conferences 
International,  Banff,  Canada,  3-7  August  2003. 

Presentations  contributions 

Transillumination  Spectroscopy  for  Breast  Cancer  Risk  Assessment,  Michelle  Simick,  Roberta 
Jong,  Brian  C.  Wilson,  Lothar  Lilge,  Photonics  North,  Rochester  NY  2001 .  (This  presentation 
won  the  best  student  presentation  price). 

Classification  of  breast  tissue  density  by  Optical  Transillumination  Spectroscopy  (OTS):  optical 
and  physiological  effects  governing  predictive  value,  Kristina  Blyschak,  Michelle  Simick, 

Roberta  Jong,  Lothar  Lilge,  Optical  Society  of  America,  Photonics  North,  Montreal,  Canada, 

May  2003. 

Optical  Transillumination  Spectroscopy  of  Breast  Tissue  for  Cancer  Risk  Assessment,  Lothar 
Lilge,  Kristina  Blyschak,  Michelle  Simick,  Roberta  Jong,  Proceedings  of  the  Society  of  Opto¬ 
electronic  Systems  Engineers,  San  Jose,  USA,  January  2003. 

Optical  Transillumination  Spectroscopy  a  possible  technique  to  assess  Breast  Cancer  Risk? 
Michelle  Simick,  Kristina  Blyschak,  Roberta  Jong  ,  Norman  Boyd,  Brian  C.  Wilson  and  Lothar 
Lilge,  Proceedings  of  the  Society  of  Opto-electronic  Systems  Engineers,  Volume  5141,  Munich, 
Germany,  June  2003. 


20 


Patent 


•  US  provisional  patent  submitted  November  20,  200 1 .  Title  Optical  Transillumination 
Spectroscopy  to  quantify  disease  risk.  Inventors  Lothar  Lilge,  Brian  C.  Wilson,  Michelle 
Simick  and  Norman  Boyd. 

•  Full  US  application  submitted  November  2002. 

•  PCT  international  Patent  Application  No.  PCT/CA02/0 1 77 1  files  November  20, 2002, 
with  a  publication  data  May  30,  2003. 


Programs  and  databases 

A  MatLab  based  program  for  PCA,  PLS  and  chromophore  analysis  based  on  the 
transillumination  spectra  was  established,  the  key  points  are: 

•  Online  sorting  of  subgroups  from  volunteer  population 

•  Calculation  of  3D  iso-probability  ellipsoids  for  clusters  under 
investigation 

•  Optimized  quantification  of  HDM  and  LDM  by  establishing  the  tangent 
plane  between  ellipsoids. 

The  figures  below  show  some  screen  prints  of  the  GUI  with  the  main  analysis  selections  and  the 
screen  for  the  subgroups  selection  using  the  PCA  interface  as  an  example. 


•.-Injxi 


OPTICAL  TRANSILLUMINATION  SPECTROSCOPY 

No  Risk  to  Know  Your  Risk 


Lothar  Ulge,  PhD;  Kristina  Blyschak  M$e; 
Ryan  Kewson;  Isuru  Silva 


m 


M 


Pleatf 


Principle  Component  Analysis 
V  :  Duster  Separation  V  i 
Chromophore  Concentrations 


H 


r  i*.  /  ■ 


University  Health  Network 

<><*•* tv****!  teifcWtH t fesM*  fixyrti 


ij 


PCA  Interface 


j  Plaaif  SoktS  A  Foldpr:  |.  j  Mens"«  SitltNrt  IVs  of  Post  |:V  |  Rinas  n  Sufoi*  Oftt'iawy:' 

.  zi  :|  [ r  p'’1  || r  *<■  *  jj  r  j  y-— -| 

ij  r  [[  r  ?0!,<fj[  rp^ic|  [  r  H0h  |[  [ 


tf  <**■>)  SI*wJ  H'jaNv.' 


PIcm*  Mtrv*  Cue*  To  La** 


JV  c 


fVs:*  Wove  Cuta  To  lend 


:  M0VC  NOT  CLaSSIFiCC) 


Ptofts#  Input  Alt  Thirknn«!u*»: 


H  >»  II  ir||  wc 


:czzr 


CupiMote  Tw*t  4  Validation 


- —J  :  :  r&ikc  C.  M*M ' ; 

|*~ fWsn MoveCum  Tn(««t  :  • C  Latwol  ' 


j  Compart*  ttww  Vt  t>CA  Spotlta  ] 

OWWJTt  j 

j  Exdudod  Spotnm  j  : 

VUVIHKl  j 

The  current  data  base  of  the  volunteers  in  our  study  comprises  8  spectra  (or  more  for  repeat 
measures),  the  complete  information  according  to  the  original  questionnaire  and  additionally,  for 
about  160  volunteers,  we  conducted  a  phone  back  to  obtain  additional  demographic  information. 

Conclusions 

This  study  demonstrates  for  the  first  time  that  tissue  modifications  preceding  the  development  of 
cancer  can  be  detected  and  possibly  quantified.  While  some  of  these  modifications  do  not  need  to 
have  a  causal  relationship  with  the  cancer,  the  ability  to  quantify  tissue  constituents,  such  as  the 
chromophores  mentioned  above,  may  provide  additional  information  for  oncologists  interested  in 
oncogenesis  and  prevention. 
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Optical  spectroscopy  shows  at  least  the  same  predictive  value  as  ionizing  radiation,  but  is 
applicable  to  women  at  a  younger  age  and  more  frequently.  Hence,  when  employed  in  preventive 
oncology  it  offers  women  two  distinct  advantages.  First,  by  permitting  its  use  at  an  earlier  age 
than  currently  available  through  mammography,  any  prevention  strategy  has  more  time  to  exert 
its  influence  and  thus  strategies  with  fewer  negative  effects  on  the  quality  of  life  can  be  explored. 
Secondly,  by  permitting  frequent  use,  the  efficacy  of  the  intervention  strategy  can  be  monitored. 

In  either  case  women  are  empowered  to  make  more  educated  decisions  regarding  the  risks  and 
benefits  of  the  intervention  strategy. 


Planned  studies. 

The  current  work  was  very  successful  in  identifying  possible  new  factors  or  traits  for  risk 
assessment  using  tissue  density  as  an  intermediate  outcome. 

The  programs  future  work  is  aimed  at  establishing  OTS’  odds  ratio  or  relative  risk  directly 
versus  breast  cancer  and  to  demonstrate  its  ability  to  monitor  risk  changes  in  individual  women 
undergoing  a  risk  reduction  intervention.  Applications  to  these  two  points  have  been  submitted  to 
the  NIH  (June  2003)  and  the  CDMRP  (March  2003),  respectively.  Another  application  to  the 
Susan  Komen  Foundation  is  planned  for  September  2003. 

Once  the  three  different  groups  of  risk  factors  or  traits  are  obtained  (for  risk,  density  and  possibly 
protection)  the  risk  factors  themselves  can  be  compared  and  insight  can  be  gained  as  to  the 
physical  contributions  of  density  to  risk,  or  density  to  protection  etc. 

In  future  work  expanding  OTS  for  non-oncological  risk  assessment  is  also  considered. 
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'Department  of  Medical  Biophysics,  University  of  Toronto,  3Mount  Sinai  Hospital, 
4University  Health  Network, 

610  University  Avenue,  Toronto,  Ontario,  M5G2M9  416  946  4501  x  5743, 
llilge@uhnres.utoronto.ca 

MKS  is  currently  with  the  Toronto  Sunnybrook  Regional  Cancer  Centre  and  RJ  is 
currently  with  The  Sunnybrook  and  Women’s  College  Health  Sciences  Center 


Keywords: 

Cancer  Risk,  Transillumination  Spectroscopy,  Breast  Cancer,  Cancer  Prevention 


Abstract 


There  is  increasing  attention  on  cancer  prevention  as  a  mean  to  reduce  cancer 
incidence  rates.  The  prevention  interventions  or  therapies  in  turn  rely  on  risk  assessment 
programs  to  identify  those  women  most  likely  to  benefit  from  education  and  lifestyle 
changes.  These  programs  are  usually  based  either  on  interviews  to  identify  ethnic,  genetic 
and  lifestyle  factors  contributing  to  risk  or  on  physical  examination  of  the  breast.  For  the 
latter  it  has  been  shown  that  the  parenchymal  density  pattern  observed  on  x-ray 
mammography  can  be  used  to  assess  an  individual’s  risk.  Extensive  areas  of  dense, 
glandular  tissue  that  are  relatively  radio-opaque  are  associated  with  higher  breast  cancer 
risk,  with  an  odds  ratio  of  4  -  6  when  compared  to  women  in  whom  the  breast  density  is 
low  due  to  an  abundance  of  adipose  tissue 

Near-infrared  optical  transillumination  spectroscopy  has  been  used  previously  to 
investigate  physiological  properties  of  the  breast  tissue.  In  this  study,  women  were 
recruited  who  recently  had  X-ray  mammography.  The  tissue  density  was  assessed  by  a 
radiologist.  They  then  underwent  optical  transillumination  spectroscopy,  for  which  an 
instrument  was  developed  that  delivered  visible  and  near-infrared  light  to  the  breast. 
After  being  transmitted  through  the  breast  craniocaudally  in  one  of  4  quadrants  the  light 
spectrum  from  625  to  1050  nm  was  measured.  The  spectra  were  used  as  input  to  Principal 
Component  Analysis  (PCA)  that  used  the  corresponding  mammographic  density  as  the 
reference  standard.  The  study  group  comprised  92  women  age  39  to  72  years.  Without 
further  stratification  for  age,  menopausal  status  or  measurement  position  the  PCA 
numerical  model  predicted  the  radiological  assessment  of  tissue  density  in  the  mid  80% 


to  low  90%. 


Introduction 


Breast  cancer  is  the  most  commonly  occurring  cancer  in  women.  In  Canada,  the 
lifetime  risk  of  being  diagnosed  with  breast  cancer  is  approximately  1  in  10,1  the  highest 
out  of  all  cancers  for  women.  The  probability  of  dying  of  breast  cancer  is  1  in  25,  which 
is  second  only  to  lung  cancer  amongst  all  cancer-related  deaths.1  Most  other  developed 
countries  are  reporting  similar  probabilities  for  diagnosis  and  death.  Breast  cancer 
screening  programs  have  been  shown  to  decrease  the  mortality  rates  of  women  between 
ages  50-69,  since  cancers  are  detected  at  an  earlier,  easier  curable  stage.  Conversely, 
the  overall  incidence  rate  of  breast  cancer  is  still  rising,  possibly  due  to  the  increasing  age 
of  the  population.3  Currently,  imaging  by  x-ray  mammography,  ultrasound  and  or 
magnetic  resonance  imaging  are  the  primary  modalities  4  used  for  breast  imaging.  These 
modalities  use  physical  or  chemical  differences  in  tissue  such  as  the  radiation  attenuation 
coefficient,  water  content  or  physical  density  to  display  differences  in  the  tissue 
morphology,  which  may  suggest  aberrant  growth  associated  with  cancer. 

While  the  understanding  of  the  mechanisms  leading  to  breast  cancer  is  increasing, 
they  are  not  fully  understood,  though  it  is  apparent  that  the  development  of  breast  cancer 
is  a  slow  process  following  initial  transformation  of  the  breast  tissue.5  There  is  currently 
an  effort  within  the  research  community  to  understand  risk  factors  for  the  disease  that  are 
exhibited  before  or  during  this  slow  transformation  process,  but  definitely  prior  to  the  any 
clinical  manifestation  of  breast  Cancer.  This  would  enable  members  of  the  highest  risk 
population  to  form  educated  decisions  towards  increased  screening  and  or  risk  reduction 
interventions.  Risk  factors  are  defined  as  those  characteristics  that  are  more  common  in 
people  with  the  disease  when  compared  to  the  population  at  large.6  Risk  factors  related 


to  breast  cancer  include  age,  country  of  residence,  first  degree  relatives  or  personal  breast 
disease  history,  genetic  factors,  anthropometric  factors,  menstrual  and  physiological 
factors. 

Screening  and  risk  reduction  intervention  will  not  provide  benefit  to  the 
individual  member  of  the  high  risk  population  but  are  rather  of  benefit  of  the  entire 
population  at  risk.  This  benefit  is  maximized  when  the  relative  risk  quantifier  employed 
is  very  large,  so  that  the  majority  of  high  risk  group  members  are  identified  while 
minimizing  the  inclusion  and  hence  exposure  of  low  or  medium  risk  subjects  to  potential 
side  effects  of  the  risk  reduction  interventions.  Risk  reduction  interventions  can  be  as 
benign  as  modifications  to  a  subject’s  lifestyle,  exercise  and  diet,  which  has  been  shown 
to  reduce  the  relative  area  of  mammographic  densities  after  two  years,7  or  invasive  such 
as  chemoprevention,  including  the  use  of  Tamoxifen  8,  aromatases  9  and  prophylactic 
mastectomy  10. 

Increased  fibroglandular  tissue  in  the  breast  which  has  a  high  x-ray  attenuation 
coefficient  ,  thus  appearing  bright  in  standard  mammograms,  is  a  known  physiological 
risk  factor.  Areas  appearing  radiologically  lucent  represent  fatty  tissue  of  the  breast  that 
are  rarely  the  source  of  aberrant  growth  in  the  breast.  Radiological  opaque  tissue  is  a 
common  source  of  carcinomas  and  consequently  the  relative  area  of  dense  tissue  is  a 
strong  risk  factor,  see  Figure  1  for  examples  of  high  and  low  x-ray  dense  breast. 
Commonly  breast  tissue  density  is  quantified  following  breast  cancer  screening  program 
visits  and  it  has  been  suggested  that  it  can  be  affected  by  hormonal  and  dietary  changes 
.  Parenchymal  density  is  used  as  the  standard  risk  assessment  tool  in  the  study 


presented  here,  as  it  provides  the  best  available  standard  for  risk  in  a  cross  sectional 
study. 

Breast  tissue  is  a  highly  light  scattering  medium  and  has  relatively  low  absorption 
in  the  red  and  near  infrared  wavelength  range,  resulting  in  an  adequate  penetration  depth 
of  light.  This  allows  a  sufficient  number  of  photons  to  be  detected  through  up  to  7  cm  of 
breast  tissue  in  a  few  seconds,  while  maintaining  the  incidence  irradiance  below 
government  guidelines  for  exposure  of  skin.13 

In  previous  diagnostic  studies  of  breast  tissue  it  has  been  shown  that 
quantification  of  water,  lipids,  hemoglobin  and  other  tissue  chromophores  is  feasible  by 
near-infrared  spectroscopy,  4  Fibroglandular  tissue  is  expected  to  result  in  increased 
water  and  simultaneous  decreased  lipid-associated  absorption,  identifiable  through 
absorption  peaks  at  978  and  930  nm,  respectively,15  (Figure  2).  It  is  also  expected  to  have 
a  higher  scattering  efficiency  than  adipose  tissue  as  seen  in  Figure  3.  Finally, 
Haemoglobin  (Hb)  can  be  identified  by  an  absorption  peak  at  760  nm  while  oxygenated 
haemoglobin  (HbC^)  has  only  a  low  and  broad  absorption  with  a  local  maximum  close  to 
920  nm.15  Transillumination  spectroscopy  have  been  shown  to  be  associated  with  the 
probability  for  the  presence  of  breast  cancer.16 

Light  remitted  from  the  opposite  side  of  the  breast  passes  at  least  twice  through 
the  skin  with  varying  melanin  content  (depending  on  ethnicity  and  sun  exposure)  which 
can  affect  the  transmission  spectrum,  and  hence  may  limit  the  predictive  value  of 
transillumination  spectra  as  the  melanin  content  does  not  affect  breast  cancer  risk.  While 
quantification  of  skin  color  is  feasible  based  on  diffuse  reflectance  spectroscopy  17  and 


can  permit  subtraction  of  melanin  associated  absorption,  it  is  not  included  in  this  study, 
and  participants  where  not  stratified  for  skin  color  or  ethnic  background. 

Optical  transillumination  spectroscopy  is  not  an  imaging  technique  and  thus  only 
bulk  tissue  properties  are  obtainable  and  are  characterized  through  spectral  shape  and 
intensity  analysis.  Hence,  for  comparison  with  mammographic  determined  risk,  the  x-ray 
images  were  classified  only  as  low,  medium  or  high  tissue  density,  omitting  spatial 
information  about  the  density  pattern. 

This  investigation  is  set  up  as  a  cross-sectional  study  to  evaluate  the  feasibility  of 
detecting  and  quantifying  breast  tissue  density  as  intermediate  to  risk  prediction  in  vivo 
using  visible  and  near  infrared  transillumination  spectroscopy.  The  hypothesis  is  that 
optical  transillumination  spectroscopy  provides  consistent  information  to  conventional 
mammography  in  quantifying  breast  tissue  density  and  hence,  indirectly  to  breast  cancer 
risk  with  an  odds  ration  compatible  to  mammography  . 

Methods 

Instrumentation.  The  clinical  spectrographic  system,  designed  and  built  in-house,  is 
shown  in  as  schematic  in  Figure  4.  A  12  Watt  halogen  lamp  (Welch  Allyn,  Buffalo,  NY), 
with  a  stabilized  power  supply  was  used  as  the  broadband  light  source.  The  ultraviolet, 
short-visible  and  mid-infrared  regions  of  the  spectrum  were  blocked  by  a  cut-off  filter 
(<550  nm)  and  a  heat  rejection  filter  (KG4,  Melles  Griot,  Carlsbad,  CA),  respectively. 
The  remaining  light  in  the  550-1300nm  range  was  coupled  by  a  20mm  focal  length  lens 
into  a  5  mm  diameter  liquid  light  guide  (Kaiser  Electronics,  San  Jose,  CA),  placed  in 
contact  with  the  top  of  the  breast.  The  total  radiant  power  delivered  to  the  skin  surface 
was  >250  mW.  The  transmitted  light  was  collected  by  a  custom-made  7  mm  diameter 


optical  fiber  bundle  (P&P  Optical  Kitchener,  ON,  Canada)  that  was  positioned  coaxially 
with  the  source  guide.  The  light  guides  were  mounted  in  a  caliper,  the  separation  of 
which  could  be  adjusted  by  hand  so  that  both  were  in  contact  with  the  breast.  Contact  of 
the  source  guide  was  firm,  with  the  breast  compressed  locally  by  not  more  than  5  mm  to 
ensure  good  coupling  to  the  tissue.  The  holder  for  the  source  guide  and  the  plate  in 
which  the  detector  guide  was  embedded  were  made  of  black  plastic  to  model  matched 
boundary  conditions.  During  spectral  measurements,  the  subject  was  seated  and  each 
breast  in  turn  resting  comfortably  on  a  support  plate,  the  height  of  which  could  be 
manually  adjusted.  No  pretreatment  of  the  skin  surface  was  required. 

The  collected  light  was  spectrally  dispersed  using  a  high-throughput  holographic 
grating  (15.7  lines/mm:,  Kaiser,  Carlsbad,  CA,  USA)  with  a  0.5  mm  entrance  slit  and 
detected  with  a  2D,  liquid  nitrogen-cooled  back  thin  silicon  CCD  array  (F-125, 
Photometries,  NJ,  USA).  The  spectral  resolution  was  <3  nm  (FWHM)  over  the  625-1060 
nm  bandwidth.  The  peak  quantum  efficiency  of  the  detector  was  >0.8  at  780  nm,  falling 
to  0.2  at  1 100  nm.  The  entrance  slit  of  the  spectrometer  was  imaged  onto  50  rows  of  the 
CCD  thus  increasing  the  dynamic  range  by  over  25.  The  dark  count  was  ~0.06  electrons 
per  hour.  Further  noise  reduction  was  achieved  using  exposure  times  of  2-3  seconds  and 
averaging  up  to  5  scans.  The  system  dynamic  range  was  >5  OD  (optical  densities)  with  a 
signal-to-noise  ratio  of  >10-1 04  across  the  spectral  range. 

This  study  was  approved  under  the  Institutional  Review  Boards  of  the  University 
of  Toronto  and  the  University  Health  Network,  with  informed  consent.  Women  were 
recruited  through  the  Marvelle  Koffler  Breast  Centre  at  Mount  Sinai  Hospital,  Toronto. 
All  had  prior  mammograms  within  12  months  of  the  spectral  measurement,  classified  by 


a  radiologist  (RJ)  as  either  low  (<  25%),  medium  (25-75%),  or  high  (>75%)  tissue 
density.  Women  showing  large  variations  between  both  sides  of  the  bilateral  organ  were 
not  included  in  this  analysis. 

Measurement  procedure  and  Spectral  Preprocessing. 

The  total  data  acquisition  time  was  approximately  15  minutes  and  was  completed 
in  complete  darkness.  A  total  of  8  spectra  were  collected  per  subject,  representing  medial, 
distal,  lateral  and  central  quadrants  of  each  breast.  To  date  a  total  of  92  women  have  been 
entered  in  the  study,  of  whom  58  are  post-menopausal.  The  wavelength  dependence  of 
the  sensitivity  was  corrected  daily  by  normalizing  the  transillumination  spectra  made 
through  a  standard  comprising  of  1  cm  thickness,  ultra-high  density  polyurethane 
(Gigahertz  Optics,  Munich,  Germany)  which  has  a  very  flat  attenuation  spectrum.  All 
tissue  spectra  are  given  as  Optical  Density  (OD)  relative  to  this  standard.  Further  pre¬ 
processing  of  spectra  included  correction  for  the  tissue  thickness,  by  calculating  the 
OD/cm  at  each  wavelength.  Auto-scaling  of  the  spectra,  i.e.  normalizing  the  spectrum  to 
average  spectrum  off  all  spectra  contained  in  the  training  set  data  for  PCA  model 
development  (see  below),  whereas  spectra  in  the  validation  set  were  scaled  using  the 
same  mean  spectra.  Table  1  lists  the  different  pre-processing  techniques  employed  to 
establish  a  correlation  between  the  spectral  data  set  and  the  breast  tissue  density. 

Data  analysis. 

The  radiological  classification  produces  a  scalar  quantity,  namely  the 
mammographic  density,  and  the  optical  spectra  is  a  vector.  Hence,  only  multivariate 
analysis  techniques  that  are  able  to  accept  such  parameters  and  that  have  been  used 
extensively  for  different  applications  requiring  the  analysis  of  complex  spectra  where 


considered  ,18, 19  Typically,  these  methods  involve  first  a  ‘training’  step  to  identify  the 
variance  within  a  set  of  spectra  and  subsequently,  a  ‘prediction’  or  ‘validation’  step  to 
determine  the  accuracy  of  a  separate  set  of  spectra  in  predicting  the  outcome,  which  in 
this  case  is  the  tissue  density  classification.  The  specific  analytic  technique  used  here  is 
Principal  Component  Analysis  (PCA) 

Mathematically,  the  PCA  procedure  is  as  follows.  First,  the  spectral  data  is 
reduced  in  extent,  while  preserving  the  maximum  amount  of  variance.20  This  is 
accomplished  by  solving  for  the  covariance  or  correlation  matrix  of  the  data  matrix 
X(m  x  n)  comprising  all  measured  spectra  (n  =  544;  training  set  only)  and  the  spectral 
range  (m  =  436  wavelengths),  such  that: 

cov(X)  =  XT  X  (1) 

n-1 


PCA  decomposes  the  data  matrix  X  as  the  sum  of  the  outer  products  of  the  scalars  of  t, 
and  vectors  p,  and  a  residual  matrix  E: 

X  =  1 1  pT  i  +  1 2  pT 2  + 1 3  pT 3  +  . . .  + 1  ,•  pT,-  +  E 
or  X  =  T  P  T  +  E  (2) 

, where  the  elements  of  the  t,  (n  x  \)  vectors  are  the  scores  that  contain  information  on 
how  the  spectra  relate  to  each  other,  and  the  p,  vectors  (m  x  \)  or  components  are  the 
eigenvectors  of  the  covariance  matrix  that  relate  the  selected  variances  to  each  other. 

The  scores  (elements  of  t  /)  can  be  graphically  plotted  against  one  another  to  show 
clustering  of  related  spectra.  The  PCA  algorithm  was  trained  on  a  test  set  (n=  544)  and 


the  same  mathematical  model,  i.e.  retaining  the  p,-,  was  used  to  determine  the  scores  t,  on 
the  validation  set  comprising  the  remaining  nv=192  spectra. 

The  statistical  significance  for  the  PCA  prediction  was  established  using  the  high 
density  measure  (HDM),  which  is  defined  as  the  ratio  of  spectra  predicting  a  woman  as 
having  high  mammographic  density  by  the  PCA  algorithm  compared  to  those  categorized 
as  having  high  tissue  density  by  the  radiologist.  Conversely,  the  low  density  measure 
(LDM)  represents  the  ability  to  correctly  identify  those  spectra  that  represent  low  tissue 
density.  Hence,  the  HDM  and  LDM  are  similar  to  sensitivity  and  specificity,  respectively. 

Results 

The  data  set  includes  mammograms  and  spectral  results  from  92  subjects  (age  36 
to  72  years).  Fifty-eight  women  were  post  menopausal,  of  whom  33  were  classified  as 
having  low,  18  medium  and  7  high  mammographic  density.  Of  the  34  pre-menopausal  5, 
18  and  1 1  were  classified  as  having  low,  medium  and  high  density,  respectively.  As  seen 
in  Table  2.  At  present,  this  classification  prevalence  does  not  reflect  the  general 
population  distribution  observed  during  the  Canadian  National  Breast  Screening  Study  21 
but  recruitment  is  ongoing. 

Figure  5  shows  a  typical  set  of  measurements,  comprising  8  spectra  from  a  single 
subject.  Spectra  from  corresponding  quadrants  on  each  breast  are  very  similar,  a  fact  used 
by  Egan  and  Doyle  16  as  a  negative  predictor  for  the  presence  of  breast  cancer. 

While  transillumination  is  a  local  measurement,  nevertheless  a  large  tissue 
volume  is  interrogated  at  each  position  (estimated  as  25  cm3  for  5  cm  breast  thickness). 
For  positions  close  to  the  circumference  of  the  breast  boundary  losses  will  affect  the 


overall  intensity  of  the  transmitted  spectra  and  could  influence  the  spectral  shape  thus 
could  limit  the  predictive  value  of  the  transillumination  technique.  Hence,  repeat 
measurements  were  made  in  one  subject,  starting  at  the  center  position  and  moving 
towards  the  medial  position  and  beyond  towards  the  circumference  of  the  breast.  The 
resulting  transillumination  spectra  are  shown  in  Figure  6,  indicating  that  while  the  overall 
intensity  is  reduced  and  hence  the  OD/cm,  up  to  a  distance  of  1  cm  from  the 
circumference  of  the  breast  the  losses  are  wavelength  independent  and  so  do  not  effect 
the  analysis. 

The  reproducibility  of  the  optical  transillumination  measurements  was  analyzed 
by  repeat  procedures  on  one  subject  during  repeat  visits  over  a  stretch  of  18  month. 
Figure  7  shows  the  correlation  of  the  tj  and  t?  scores  from  these  repeat  spectra. 
Component  scores  (ty  and  t ?)  vary  between  quadrants  but  cluster  tightly  for  a  given 
position  indicating  that  the  spectroscopy  data  are  reproducible. 

Figure  8  shows  the  principal  components  (p,)  resulting  from  the  PC  A  using  n  = 
544  corrected  spectra.  The  pi-p4  represents,  respectively,  97.6,  1.2,  0.6  and  0.3  %  of  the 
variance  in  the  total  data  set,  for  a  combined  99.8%  of  the  variance.  The  cluster  plot  of 
the  scores  for  ty  and  1 2  are  shown  in  Figure  9,  illustrating  discrimination  of  the  breast 
tissue  density  across  a  diagonal  line  in  the  ty  vs.  1 2  space.  Figure  10  shows  the 
reconstruction  of  a  randomly  selected  transillumination  spectrum  from  according  to  the 
variance  captured  only  by  py  and  p^,  as  well  as  that  captured  by  the  first  four  components 
in  thickness  corrected  spectra.  The  reconstruction  from  all  four  components  shows  a  good 
representation. 


Spectra  that  had  not  been  corrected  for  thickness  were  used  to  determine  the 
effect  of  thickness  on  the  shape  of  the  component  vectors  pi-p4  (Figure  1 1)  and  the 
resulting  cluster  plots  of  tj  vs.  t2,  see  Figure  12.  The  component  spectra  are  very  similar 
to  the  thickness-corrected  components,  but  the  cluster  plots  of  ty  vs.  t2  shows 
discrimination  as  a  function  of  t?  only.  Similarly,  component  spectra  and  cluster  plot 
were  obtained  also  for  autoscaled  and  transfer  function  corrected  spectra  (data  not 
shown).  The  resulting  HDM  and  LDM  values  for  the  different  spectral  pre-processing 
methods  are  shown  in  Table  2. 

Symmetry  across  the  same  bilateral  quadrants  for  each  individual  is  shown  in 
Figure  13  for  all  scores  of  ty  and  t?  derived  from  thickness  corrected  spectra,  reflecting  a 
pool  of  women  with  homogenous  densities  across  both  breast. 

Discussion 

Bilateral  symmetry  in  the  spectra  at  corresponding  quadrants  (Figure  12)  is  expected  in 
our  study  population  as  it  is  a  criterion  to  indicate  absence  of  breast  cancer  according  to 
the  previous  studies  by  Egan  and  Dolan  16. 

Autoscaling  of  the  spectra  prior  to  PCA  modeling  removes  some  spectral 
information  since  the  subtracted  mean  spectrum  is  wavelength  dependent.  As  the 
spectral  features  contributing  to  the  discrimination  between  high  and  low  breast  density 
or  risk  are  unknown,  losing  spectral  information  is  not  advisable,  even  though  no 
significant  loss  in  HDM  and  LDM  was  noted.  Additionally,  calculating  component 
spectra  after  auto  scaling  will  not  enable  use  of  principal  component  filters  in  future  work 
as  suggested  elsewhere.  22 


Cluster  plots  (Figures  9)  based  on  the  scores  t;  and  1 2  resulting  from  thickness 
corrected  spectra  demonstrated  that  it  is  possible  to  differentiate  between  subjects  having 
low  or  high  breast  tissue  densities. 

While  PCA  models  for  both  native  and  thickness  corrected  spectra  enable 
differentiation  between  high  and  low  breast  tissue  densities,  their  t;  vs.  1 2  cluster  plots 
differ.  One  obvious  explanation  is  the  effect  of  the  physical  tissue  thickness  on  the 
overall  variance  within  the  spectral  data  set.  In  the  model  based  on  thickness  corrected 
data,  p/  can  not  differentiate  between  high  and  low  density  tissue.  Additionally,  the  range 
of  1 1  values  is  smaller  in  the  thickness  corrected  data  as  seen  in  Figure  9  when  compared 
to  the  non-thickness  corrected  data  seen  in  Figure  12.  This  indicates  that  the  thickness 
values  contribute  to  the  magnitude  of  p/  masking  other  contributions  that  could 
differentiate  between  tissue  densities,  such  as  light  scattering,  and  thus  leaving  only  1 2  to 
preserve  information  distinguishing  between  the  two  breast  tissue  density  groups. 
Principle  component  spectra  one,  (p/)  based  on  the  thickness  corrected  spectra  is  de  facto 
wavelength  independent  but  includes  losses  due  to  optical  path  length  (and  therefore  light 
scattering)  and  losses  at  the  circumference  of  the  breast. 

Pre-processing  of  the  spectra,  including  thickness  correction,  is  clinically  relevant 
since  it  is  controllable  and  it  has  been  shown  that  thickness  will  contribute  non-uniformly 
to  the  spectra  due  to  the  correlation  between  lower  density  and  larger  breasts.12 

When  comparing  the  autoscaled  versus  non-autoscaled  data,  there  were  minimal 
changes  in  the  principal  component  spectra  and  minor  differences  in  HDM  and  LDM 
values,  see  and  Table  2.  Autoscaling  as  part  of  the  pre-processing  can  degrade  regions 
with  flat  or  extreme  spectral  variation.19  Here,  degraded  spectral  features  could  include 


regions  of  the  spectrum  with  minimal  wavelength  dependence  and  hence,  a  first 
derivative  close  to  zero.  For  example,  the  hemoglobin  inflection  points  are  more 
pronounced  in  the  non-autoscaled  data  than  in  the  autoscaled  components.  Conversely, 
the  spectral  features  of  water  and  lipids  are  large  compared  to  other  structures  in  the 
spectra,  but  are  less  pronounced  after  auto  scaling.  In  this  study,  the  only  differences  in 
the  model  performance  is  in  the  training  set  using  non-autoscaled  spectra  having  about 
2%  higher  scores  for  both  HDM  and  LDM. 

Principal  components  can  reveal  particular  regions  of  the  spectrum  that  represent 
important  physical  properties  or  entities  within  the  tissue  that  contribute  to 
differentiation.  Component  spectra  p/  and  p.?  are  the  most  important  and  cover  the 
highest  amount  of  variance  in  the  data  set.  While  components  3  and  4  have  similar  or 
inverse  shape  as  component  2  they  take  less  variance  into  account. 

The  derivation  of  OD  used  here,  which  is  based  on  the  wavelength  dependent 
transfer  function  calibration  by  a  polyurethane  block  with  high  Mie  scattering  resulted  in 
the  surprisingly  flat  spectral  shape  of  the  principle  component  spectra  p /,  as  thus  the 
wavelength  dependent  Mie-scattering  cancels  when  the  ratio  of  the  two  spectra  is  taken. 
Hence,  p;  carries  optical  scattering  information  despite  not  showing  the  typical 
dependency,  23  and  thus  the  inverse  of  t;  represents  the  overall  scattering  power.  Low 
density  tissue  spectra  have  a  reduced  amount  of  scattering  compared  to  high  density 
tissue,  and,  therefore,  higher  values  of  t /  as  seen  in  Figure  9.  This  relationship  in 
scattering  properties  is  also  seen  in  the  scattering  coefficient  data  by  Peters  et  al. 24  and 
Troy  et  al.,  n. 


Component  vector  p?,  enables  differentiation  between  low  and  high  tissue 
densities  through  its  most  important  spectral  features  related  to  the  lipid  with  inverse 
water  peaks  present  at  930  nm  and  980  nm,  respectively.  Thus,  when  t?  is  positive,  the 
lipid  associated  attenuation  is  the  dominant  feature  as  anticipated  for  fatty  or  low  density 
tissue.  Spectra  from  the  high  density  tissue  have  negative  t2  and  water  absorption 
becomes  the  dominant  structure  in  the  component  spectrum.  Graham  et  al.  (1996)  25  also 
observed  this  relationship  between  water  and  density  values  when  using  MRI  to  quantify 
percent  density.  In  their  study  the  water  content  of  the  tissue  was  measured  directly  and 
showed  adequate  correlation  to  percent  tissue  density  (r  =  0.79). 

Contributions  by  hemoglobin  to  the  spectral  features  of  p 2  are  observed  between 
625  and  850  nm  where  the  negative  slope  and  inflection  points  of  the  hemoglobin  curve 
are  apparent.  Dense  breast  tissue  has  lower  t2  scores  compared  to  the  low  density  tissue, 
indicating  higher  hemoglobin  and  water  contributions.  Conversely,  p.?  shows  a  lipid 
absorption  peak,  but  water  and  hemoglobin  absorption  are  absent  hence,  if  used  as  a  third 
discriminator  the  overall  content  of  fatty  tissue  is  represented.  The  simultaneous 
appearance  of  water  and  hemoglobin  absorption  in  p2  can  be  explained  physiologically, 
as  tissues  with  higher  water  content  and  hence  cellular  content,  require  improved  vascular 
supply  and,  thereby,  increased  blood  volume.  26  Since  positive  t2  scores  are  related  to  low 
tissue  density  and  positive  tj  scores  are  related  to  low  tissue  scatter,  the  cluster  plot  of  tj 
and  t2  can  be  divided  into  quadrants  as  shown  in  Figure  14,  highlighting  the  relationship 
between  the  spectral  features  and  the  known  physical  attributes  of  breast  tissue. 

While  a  cluster  plots  based  on  1 3  and  ty  does  not  allow  good  differentiation 
between  high  and  low  density  tissue,  regions  of  the  corresponding  component  spectra  py 


and  p*  show  interesting  effects  such  as  for  p5  a  red  shifted  lipid  peak  and  a  small  blue 
shifted  water  peak  and  p./  shows  influence  from  both  forms  of  hemoglobin,  with  the  same 
slope  but  inverse  inflection  points  to  p^.  The  underlying  physical  or  physiological  effects 
for  these  observations  are  unclear  at  this  time.  While  the  amplitudes  and  general  shape  of 
the  spectra  are  similar  to  p^  the  magnitude  of  the  scores  t?  and  tj  are  much  smaller  than 
those  of  the  first  two  components,  and  may  represent  only  relative  corrections  for  p2 


Conclusions 


In  vivo  optical  transillumination  spectroscopy  is  technically  feasible  and  capable 
of  predicting  breast  tissue  densities  with  good  correlation  to  mammographic  densities  and 
hence,  has  good  potential  to  be  developed  into  a  preferred  method  of  cancer  risk 
assessment,  so  the  strength  of  a  direct  correlation  with  cancer  risk  needs  to  be  proven  in  a 
case-control  study  and  possibly  a  longitudinal  study  to  estimate  the  validity  of  the 
correlation  and  hence  predictive  value  for  a  longer  period  of  time. 

According  to  the  results  of  the  current  study,  it  is  anticipated  that  the  odds  ratio  of 
the  transillumination  measurements  should  be  close  to  those  of  the  parenchymal  densities 
seen  on  mammograms  (i.e.  between  4  and  6),  since  the  PCA  results  show  HDM  and 
LDM  values  close  to  or  above  0.90. 

Transillumination  spectroscopy  may  offer  a  novel  “first  step”  in  the  risk 
assessment  of  healthy  women  regardless  of  age,  menstrual  cycle,  ethnic  background  or 
menopausal  status  as  the  data  and  analysis  presented  here  was  not  subject  to  stratification 
by  either  event. 

Spectral  features  associated  with  tissue  density  prediction  include  water  and 
lipids,  as  well  as  spectral  features  related  to  hemoglobin  absorption.  The  effect  of  light 
scattering  on  measured  spectra  holds  importance  in  the  differentiation  of  breast  tissue 
density  after  correction  of  the  data  for  physical  breast  tissue  thickness. 

HDM  and  LDM  values  close  to  or  above  90  %  are  very  promising  at  this  stage  to 
distinguish  between  low  and  high  density  tissues  as  they  are  higher  than  other  physical 
examinations,  such  as  ultrasound  27  and  magnetic  resonance  imaging,25  reported  to  be 


between  70-80  %. 


Optical  transillumination  spectroscopy  offers  the  potential  of  a  real-time  and  cost- 
effective  method  with  the  ability  to  classify  tissue  densities  for  breasts  that  are  up  to  7  cm 
in  thickness  in  the  current  instrument.  Improvements  in  CCD  technology,  such  as  deep 
depletion  wells  can  increase  the  opto-electronic  detection  and  thus  will  increase  the 
detection  ability.  An  added  advantage  of  transillumination  spectroscopy  over  ultrasound 
and  MRI  is  the  fact  that  results  are  derived  from  preset  mathematical  models  and  hence, 
no  additional  trained  personnel  are  required  for  image  interpretation  or  assessment.  This 
reduces  the  overall  cost  to  the  healthcare  system  for  this  risk-assessment  technique.  The 
compactness  of  the  devices  makes  it  highly  mobile  and  ideal  to  serve  remote  areas  or 
developing  countries.  A  painless  procedure  and  the  inherent  safety  of  this  method  will 
likely  contribute  to  a  higher  compliance  rate,  thus  possibly  assisting  in  affecting  overall 
survival  rates. 

One  notable  limitation  in  this  preliminary  study  was  the  number  of  study  subjects, 
which  may  have  resulted  in  sub-optimal  predicted  values  for  HDM  and  LDM.  Also,  by 
using  cluster  analysis  in  3D  or  higher  dimensions,  other  components  such  as  p*  can  be 
included  to  improve  classification  of  tissue  density. 

X-ray  mammography  uses  ionizing  radiation  and  is  considered  unacceptable  as  a 
tool  to  assess  breast  density  for  women  less  than  forty  years  of  age  and  for  frequent 
measurement,  whereas  transillumination  spectroscopy  is  safe  for  women  of  all  ages.  This 
allows  risk  assessment  to  commence  at  a  much  younger  age  when  the  life  style  and  diet 
are  perhaps  easier  to  influence  and  this  mild  risk  reduction  interventions  have  one  to  two 
decades  more  to  effectively  reduce  the  cancer  risk,  thus  ultimately  leading  to  reduced 


incidence  rates. 


While  optical  transillumination  spectroscopy  may  be  a  promising  tool  to  monitor 
the  effectiveness  of  risk  reduction  interventions  such  as  chemopreventive,  dietary  or 
lifestyle  changes  aimed  at  the  reduction  of  breast  cancer  risk,  its  ability  to  detect  physical 
changes  over  a  period  of  time  in  the  breast  tissue  of  a  given  individual  needs  to  be 
demonstrated  in  a  prospective  longitudinal  study. 
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Figure  Captions 

Figure  1 :  Examples  of  x-ray  based  mammograms  showing  breasts  with  either  a)  high  and 
b)  low  tissue  density.  Note:  different  x-ray  exposures  were  used  for  the  two  examples. 

Figure  2:  Absorption  spectra  of  some  major  chromophores  constituents  in  breast  tissue, 
including,  on  the  left  side  water  (grey)  and  lipid  (black)  and  on  the  right  side  hemoglobin 
(black)  and  oxygenated  hemoglobin  (grey) 15 

Figure  3:  Graph  of  the  scattering  coefficient  (|o.s )  of  adipose  (black)  and  fibrous  (grey) 
breast  tissue  in  the  wavelength  range  of  interest.  Adapted  according  to  Troy  et  al.  1 1 

Figure  4:  Set-up  schematic  of  transmission  measurement  system,  comprising  of  cw  white 
light  source,  optodes  (liquid  light  guide  and  fiber  bundle  in  the  caliber  mount,  breast 
support,  spectrophotometer  and  CPU  . 

Figure  5:  Typical  spectra  from  a  volunteer  after  correction  for  the  spectral  system  transfer 
function  and  tissue  thickness.  Note  the  good  reproducibility  between  corresponding  sides 
of  the  bilateral  organ. 

Figure  6:  Effect  of  boundary  losses  at  the  breast  circumference.  Attenuation  spectra  A: 
form  a  volunteer  at  various  distances  from  breast  circumference  (center  position  black, 
medial  position  dark  gray,  2  cm  from  circumference  grey  and  1  cm  from  circumference 
light  grey.  B:  Ratio  of  the  same  spectra  over  the  average  of  all  four  spectra  to  exaggerate 
spectral  variance  due  to  boundary  losses. 
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Figure  7:  Repeatability  of  ti  and  t2  in  one  volunteer  at  all  8  positions.  The  slope  of  the 
regression  line  is  1.03  and  0.87,  and  the  Pearson  correlation  coefficient  is  0.72  amd  0.84 
for  t|  and  t2,  respectively. 

Figure  8:  Plot  of  component  p;  to  (black  to  light  grey,  respectively)  from  PC  A  using 

tissue  thickness  and  spectral  transfer  function  corrected  spectra. 

Figure  9:  Cluster  plot  of  ti  vs.  t2  resulting  of  PCA  using  thickness  and  system  spectral 
transfer  function  corrected  spectra  from  volunteers  with  high  (square)  or  low  (rhombus) 
breast  tissue  density.  Shown  are  only  scores  for  the  center  measurements,  with  the 
training  spectra  shown  as  closed  and  the  validation  spectra  as  open  symbols. 


Figure  10:  Raw  data  spectrum  (black)  and  reconstruction  using  either  only  the  first  two 
components  (light  grey)  or  the  first  four  components  (grey)  based  on  the  principle 
components  shown  in  Figure  8. 

Figure  1 1 :  Component  spectra  of  pi  to  p4  resulting  from  spectra  only  corrected  for  but 
the  system  transfer  function. 

Figure  12:  Cluster  plot  of  ti  vs.  t2  resulting  of  PCA  using  only  system  spectral  transfer 
function  corrected  spectra  from  volunteers  with  high  (square)  or  low  (rhombus)  breast 
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tissue  density.  Shown  are  only  scores  for  the  center  measurements,  with  the  training 
spectra  shown  as  closed  and  the  validation  spectra  as  open  symbols 

Figure  13:  Comparison  of  the  a)  t;  and  b)  1 2  scores  for  the  left  and  right  breasts  in 
volunteers  with  either  high  or  low  tissue  density.  Black  diamonds  represent  volunteers 
from  the  training  set;  grey  squares  represent  those  from  the  validation  set.  Slope  and 
Pearson  correlation  coefficient  are  0.94  and  0.76  for  tj ,  and  1  and  0.83  for  t 
respectively.  . 

Figure  16:  Cluster  plot  of  thickness  and  spectral  system  transfer  function  corrected  data 
of  high  (square)  and  low  (rhombus)  tissue  density  volunteers.  The  4  quadrant  indicate 
common  optical  and  anatomical  tissue  properties. 
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Table  1 :  Distribution  of  recruited  volunteers  and  population  proportions,  from  the 
National  Breast  Screening  Study  (ages  40-59).21 


Density 

Category 

Pre  menopausal 

Post  menopausal 

Total 

Study 

Proportion 

<%) 

Population 

Proportion 

(%) 

Training 

Set 

Validation 

Set 

Total 

Training 

Set 

Validation 

Set 

Total 

Low 

4 

1 

5 

25 

8 

33 

38 

41 

37 

Medium 

13 

5 

18 

13 

5 

18 

36 

39 

49 

High 

8 

3 

11 

5 

2 

7 

i 

|  18 

20 

14 
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Table  2:  HDM  and  LDM  of  Principal  Component  Analysis  results  for  test  and  validation 


set  measurements. 


Data  pre-processing 

Test  set 

HDM 

LDM 

Validation  Set 

HDM  LDM 

Transfer  function  corrected 
(Figure  12) 

84.6  % 

97.0% 

87.5  % 

90.3  % 

Thickness  and  transfer  function 
corrected 
(Figure  9) 

88.4  % 

93.1  % 

92.5  % 

88.8  % 

Autoscaled  -  transfer  function 
corrected  (data  not  shown) 

85.6  % 

94.4  % 

90.0  % 

86.1% 

Autoscaled  -  thickness  and 
transfer  function  corrected  (data 
not  shown) 

86.5  % 

91.8% 

92.5  % 

90.3  % 
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ABSTRACT 


Breast  cancer  is  the  most  commonly  occurring  cancer  in  women.  The  lifetime  risk  of  being  diagnosed  with  breast  cancer 
is  approximately  1  in  10  thereby  the  highest  out  of  all  cancers.  Breast  cancer  screening  programs  have  been  shown  to 
decrease  the  mortality  rates  of  women  between  ages  50-69,  since  cancers  are  detected  at  an  earlier,  more  favourable 
stage.  It  is  apparent  that  the  development  of  breast  cancer  is  a  slow  process  following  initial  transformation  of  the  breast 
tissue.  Hence,  there  has  been  a  strong  effort  within  the  research  community  to  understand  risk  factors  for  the  disease. 
Risk  factors  are  defined  as  those  characteristics  that  are  more  common  in  people  with  the  disease  when  compared  to  the 
normal  population.  Quantification  of  an  individual’s  breast  cancer  risk  may  lead  that  individual  to  modify  her  lifestyle 
and/or  diet.  Lifestyle  changes  could  lead  to  a  reduction  in  the  incidence  of  breast  cancer. 

Anatomically,  the  presence  of  increased  amounts  of  fibroglandular  tissue  raises  the  estimated  risk  by  up  to  6  fold 
(corrected  for  age),  hence  representing  one  of  the  strongest  known  risk  factor  pertaining  to  the  entire  female  population. 
In  this  study  the  relative  area  of  mammographic  densities  within  a  mammogram  will  be  used  as  a  global  risk  assessment 
tool.  It  has  been  shown  previously  that  quantification  of  water,  lipids,  haemoglobin  and  other  tissue  chromophores  of  the 
optically  interrogated  breast  tissue  which  give  also  give  rise  to  the  mammographic  densities,  is  feasible  through  near- 
infrared  spectroscopy.  Thus,  the  hypothesis  for  this  study  is  that  optical  transillumination  spectroscopy  provides 
consistent  and/or  complementary  information  to  conventional  mammography  in  quantifying  breast  density. 


Keywords:  optical  transillumination  spectroscopy,  breast 
analysis. 

1.  INTRODUCTION 

Preventive  oncology  involves  two  tasks,  first 
identification  of  the  population  at  risk  and  second  the 
implementation  and  efficacy  monitoring  of  intervention 
strategies.  The  first  task  requires  methods  and 
techniques  based  on  physical  measurements  that 
identify  individuals  at  high  risk  and  who  would  benefit 
most  from  the  interventions.  These  risk  quantification 
techniques  must  be  applicable  to  the  entire  population 
and  allow  for  identification  of  individuals  at  high  risk 
for  developing  cancer  with  high  sensitivity  and 
specificity.  The  latter  is  important  for  several  reasons: 
to  identify  all  individuals  at  risk,  to  decrease  the 
number  of  individuals  undergoing  unnecessary 
treatment  (i.e.  those  who  are  low  or  medium  risk),  and 
to  empower  individuals  to  make  informed  decisions 
regarding  their  health  and  the  potential  benefits  of  risk 
reduction  strategies.  While  some  genetic  predisposition 


tissue  density,  breast  cancer  risk,  principle  component 


towards  a  certain  cancer  is  known,  they  often  target 
only  a  small  subpopulation  such  as  women  with 
BRCA1  and  BRCA2  genetic  mutations  []]  and  hence  are 
not  suitable  for  population  wide  risk  screening. 

Breast  cancer  is  the  most  commonly  occurring  cancer  in 
women.  The  lifetime  risk  for  developing  breast  cancer 
is  1  in  5-10,  depending  on  the  reporting  agency  [2\ 
While,  these  statistics  are  rather  poor  one  needs  to 
consider  that  the  tissue  transformation  preceding  breast 
cancer  can  occur  20  years  prior  to  the  development  of 
the  disease  thereby  providing  a  “window  of 
opportunity”  to  employ  risk  reduction  interventions 
such  as  modification  in  lifestyle,  diet,  chemopreventive 
treatments  (i.e.  Tamoxifen,  Raloxifene) [4]  or  in  severe 
cases,  such  as  women  with  BRCA1  and  BRCA2 
genetic  mutations  or  similar  risk  prophylactic 
mastectomy  [5\  All  of  these  interventions  are  designed 
to  prevent  cancer  and  inadvertently  they  will  influence 
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the  quality  of  a  woman’s  life,  particularly  by  those 
rather  aggressive  interventions  designed  to  achieve  a 
risk  reduction  in  only  a  few  month  or  years.  Thus, 
considering  that  identification  of  the  women  at  risk  at 
an  earlier  age  may  result  in  an  adequate  reduction  in 
breast  cancer  risk  by  only  a  relatively  small  lifestyle 
changes,  such  as  exercise  or  diet  but  acting  over  a 
longer  period  of  time.  Thus,  a  risk  assessment  technique 
should  be  inherently,  compatible  with  screening  of  a 
women  at  an  earlier  age. 

Current  methods  of  establishing  the  breast  cancer  risk 
include  the  Gail  Risk  Model  ^  and  parenchymal 
density  patterns  derived  from  standard  x-ray 
mammography.  The  former  is  mostly  based  on 
demographic  information  while  the  latter  represents  a 
physical  property  of  tissue.  Parenchymal  density 
patterns  reflect  the  ratio  of  glandular  and  connective 
tissue  to  adipose  tissue  within  the  breast.  Women  with 
mammographically  dense  tissue  occurring  in  more  than 
70%  of  the  breast  are  4  to  6  times  more  likely  to 
develop  breast  cancer  than  those  with  low  tissue  density 
(<  25%)  [6].  Because  of  the  inherent  risks  of  x-rays, 
mammography  is  not  recommended  as  an  annual 
diagnostic  for  women  younger  than  40  in  most 
countries  [1\  thereby  limiting  the  time  available  for  risk 
reduction  interventions  to  assert  their  influence. 
Similarly,  the  demographic  information  required  for  the 
Gail  Risk  Model  is  often  not  available  for  women  until 
they  have  reached  thier  late  30ties  or  40ties.  Strictly 
speaking  the  Gail  Risk  Model  is  only  valid  for  women 
over  30. 

Near-infrared  optical  transillumination  spectroscopy 
(OTS)  has  been  shown  to  give  information  about  breast 
tissue  composition  f8l  Specifically,  OTS  results  in 
unique  optical  spectra,  governed  by  hemoglobin,  water 
and  lipid  absorption  in  the  tissue  and  the  average 
scattering  power  [9].  The  differences  in  chromophore 
contributions  in  the  tissue  have  also  been  shown  to 
mirror  the  x-ray  dense  and  x-ray  lucent  areas  of  the 
mammogram  \  In  contrast  to  mammography,  which 
is  based  on  ionizing  radiation  probing  the  nuclear 
composition  of  tissue,  OTS  uses  non-ionising  radiation 
with  quantum  energies  that  interact  with  the  electronic 
and  vibrational  levels  of  the  molecules,  representing 
more  the  anatomic,  metabolic  status  of  the  breast. 
Consequently,  OTS  can  be  used  more  frequently  and  on 
younger  women  and  should  be  able  to  detect 
differences  in  tissue  composition  between  high  and  low 
risk  groups,  based  on  molecular  composition. 


To  establish  the  feasibility  of  OTS  as  a  Breast  Cancer 
Risk  assessment  technique,  correlations  between  the 
spectra  and  the  Parenchymal  density  pattern  were 
established.  These  correlations  were  initially 
established  using  Principle  Component  Analysis  (PCA), 
using  tissue  density  classifications  by  a  radiologist  as. 
Multivariate  analysis  techniques  are  used  to  determine 
if  stratification  of  the  spectral  data  by  eiher 
measurement  position  on  the  breast  in  cranial-caudal 
projection  (center,  medial,  distal,  or  lateral)  or  physical 
parameters,  such  as  the  volunteer’s  age  and  body  mass 
index  (BMI),  is  beneficial  in  increasing  the  tissue 
density  prediction  by  OTS. 

2.  MATERIALS  AND  METHODS 

2.1  Patient  recruitment 

The  data  set  used  in  this  study  includes  a  collection  of 
mammograms  and  OT  spectra  gathered  from  156 
volunteers  recruited  through  the  Marvelle  Koffler 
Breast  Centre  at  Mount  Sinai  Hospital,  Toronto, 
Ontario.  All  women  had  a  film  screening  mammogram 
within  the  last  12  months  of  being  recruited,  which  was 
negative  for  cancer  lesions,  and  did  not  show  a  large 
bilateral  variation  in  tissue  density.  All  women  also  had 
no  previous  surgery  to  the  breast  tissue,  including 
reduction,  implants  or  tattoos. 

Film  mammograms  were  classified  on  a  nominal  scale 
by  a  radiologist  (Dr.  Roberta  Jong)  into  either  low  (<  25 
%),  medium  (25%  to  75%)  or  high  (>75%  dense  tissue 
area)  density  categories.  Sixty-one  women  were 
classified  by  the  radiologist  as  having  low,  68  as  having 
medium,  and  27  as  having  high  tissue  density.  The 
population  contributions  of  the  tissue  density  categories 
in  this  study  closely  reflect  the  population  proportions 
seen  in  the  Canadian  National  Breast  Screening  Study 
J12]  (Table  1).  The  age  of  the  volunteers  ranged  from  36 
to  72  years. 

2.2  Visible  near-infrared  transmission 

A  12W  halogen  lamp  served  as  broadband  light  source, 
ultra-violet,  part  of  the  visible  spectrum  and  mid- 
infrared  radiation  were  eliminated  using  a  cut-on  filter 
and  a  heat  rejection  filter,  respectively,  and  the 
remaining  light  was  coupled  into  a  5  mm  diameter 
liquid  light  guide  in  contact  with  the  skin  on  top  of  the 
breast  tissue.  A  total  power  of  0.25W,  covering  the  550 
to  1300  nm  bandwidth,  was  delivered  to  the  skin. 
Transmitted  light  was  collected  via  a  7  mm  diameter 
optical  fiber  bundle  (P&P  Optica,  Kitchener,  Waterloo, 
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140  fibres,  200pm,  N.A.  0.36)  and  wavelength 
dependent  detection  in  the  visible  and  near-infrared  was 
achieved  using  a  Kaiser  spectrophotometer  with 
holographic  transillumination  grating  (15.7rules/mm 
blazed  at  850nm)  and  a  2D  cryogenically  cooled  silicon 
CCD  (Photometries,  New  Jersey,  USA)  at  a  spectral 
resolution  of  better  than  3  nm  between  625  and  1060 
nm.  Spectral  resolution  was  achieved  by  positioning  a 
0.5  mm  slit  between  the  distal  end  of  the  collection 
fibre  and  the  spectral  grating.  The  peak  quantum 
efficiency  of  the  back  thin  CCD  is  at  780  nm  with  a 
quantum  efficiency  of  0.2  remaining  at  1060  nm.  By 
imaging  the  entrance  slit  of  the  spectrograph  onto  the 
2D  CCD,  50  rows  of  pixels  were  exposed  to  detected 
light  thereby  increasing  the  dynamic  range  of  the 
electronic  detection  by  a  factor  of  >  30.  Cryogenic 
cooling  was  used  to  minimize  background  noise. 
Further  signal  to  noise  improvement  was  accomplished 
by  using  exposure  times  of  3  to  5  seconds,  and 
averaging  5  scans  for  all  spectra.  Figure  1  shows  a 
block  diagram  of  the  setup 


All  spectra  were  corrected  for  daily  variations  in  the 
wavelength  dependent  signal  transfer  function  of  the 
optical  system  and  the  thickness  of  the  interrogated 
tissue.  To  correct  for  the  signal  transfer  function, 
spectra  were  referenced  to  a  transmission  standard 
made  of  1  cm  thick  ultra  high  density  polyurethane 
(Gigahertz  Optics,  Munich,  Germany).  Consequently, 
all  volunteer  spectra  are  expressed  in  units  of  optical 
density  per  centimeter  [OD/cm],  given  by  the  negative 
log  of  the  raw  data  spectrum  divided  by  the  reference 
spectrum  of  the  polyurethane  block  plus  the  optical 
attenuation  of  the  polyurethane  block  divided  by  the 
interoptode  distance.  The  scattering  and  absorption 
properties  of  the  standard  were  measured  in  a  separate 


experiment  using  an  integrating  sphere  diffuse 
reflectance  set-up  [13]  (OD  ~  1.8  -  2.3  over  the 
wavelength  range  of  interest). 

All  volunteer  measurements  were  taken  in  the  dark, 
with  the  volunteer  seated  comfortably  in  an  upright 
position  and  the  breast  resting  on  a  horizontal  platform. 
A  total  of  eight  measurements  in  cranial -caudal 
projection  were  taken,  four  per  breast  (center,  medial, 
distal,  and  lateral).  Typical  data  acquisition  for  all  8 
measurements  averaged  160  to  200  seconds.  The  source 
and  detector  fibers  (optodes)  were  held  coaxially, 
pointing  towards  each  other,  by  a  caliper  attached  to  the 
resting  platform  providing  the  interoptode  distance.  The 
source  fibre  was  placed  against  the  skin  on  the  top 
surface  of  the  breast  with  minimal  compression. 
Considering  typical  tissue  optical  properties  and  a 
tissue  thickness  of  5  cm,  an  ovoid  shape  volume 
estimated  at  30  cm3  is  interrogated. 


Table  1 .  Breakdown  of  study  volunteers:  including 
study  and  population  proportions  and  total  number  of 
spectra  analysed  (numbers  in  parentheses). 


Density 

Category 

Training 

Set 

Validation 

Set 

Proportion 

Study  Population 

Low 

46(368) 

15(120) 

39.0% 

37.0% 

Medium 

51(408) 

17(136) 

43.6% 

49.0% 

High 

20(160) 

7(56) 

17.3% 

14.0% 

Totals 

117(936) 

39(312) 

2.3  Data  Analysis 

Principle  Component  Analysis  (PCA)  was  used  to 
establish  a  correlation  between  the  obtained  optical 
transillumination  spectra  and  mammographic  density. 
PCA  is  an  analytic  mathematical  method  optimized  for 
comparison  of  vectors  (i.e.  spectra)  with  nominal  data 
(i.e.  tissue  density)  11  *■ 15].  PCA  determines  the  amount 
of  variance  within  the  population  of  spectra  and 
iteratively  reduces  the  data  to  a  few  representative 
spectra  called  components  (pi).  Scores  (//)  are  then 
assigned  to  each  individual’s  spectra  to  show  how  much 
of  each  component  is  present  in  the  original  data 
spectrum.  The  scores  of  two  or  more  components  can 
also  be  plotted  against  one  another  to  identify  clusters 
of  spectra  that  are  closely  related  and  that  exhibit 
common  traits.  Clusters  are  assigned  an  outcome,  here 
low  versus  high  tissue  density,  and  lines  or  planes 
separating  clusters  are  determined  analytically.  Scores 
that  enable  differentiation  between  tissue  densities 
identify  useful  component  spectra  and  hence 
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chromophore  contributions  to  density  and  thus 
indirectly  risk. 

In  this  study,  PCA  was  executed  on  75%  of  all  spectra 
(n  =  936,  117  volunteers  x  8  measurements)  randomly 
selected  from  each  of  the  defined  tissue  density 
categories  comprising  the  training  set.  In  this  manner, 
the  relative  contribution  of  each  tissue  density  category 
in  the  data  sets  was  retained  during  the  analysis.  The 
remaining  25%  of  the  spectra  (n  =  312,  39  volunteers  x 
8  measurements)  were  placed  in  a  validation  set  (Table 

I) .  The  principle  components  (pi)  derived  from  the 
training  set  spectra  were  then  used  to  determine  the 
scores  (ti)  on  the  validation  set  spectra,  thereby  testing 
the  predictive  ability  of  the  model.  The  PCA  data  sets 
were  not  stratified  by  menopausal  status  (i.e.  pre  versus 
post-menopausal)  since  a  previous  study  by  our  group 

II]  demonstrated  no  influence  of  the  menstrual  cycle  on 
the  spectral  measurements. 

To  determine  the  predictive  value  of  the  PCA  model 
and  hence  OTS,  two  measures  comparable  to  sensitivity 
and  specificity,  a  high  density  measure  (HDM)  and  a 
low  density  measure  (LDM),  were  determined  for  both 
the  training  and  validation  sets,  where  HDM  are  the 
correctly  predicted  women  with  high  tissue  density  over 
all  women  with  high  density,  and  LDM  respectively  are 
all  correctly  predicted  women  with  low  tissue  density 
over  all  women  with  low  density. 

HDM  and  LDM  were  determining  using  a  separation 
plane  that  differentiated  the  high  and  low  tissue  density 
clusters  resulting  from  spectra  of  the  training  sets  in  R3. 
Analysis  was  executed  on  cluster  plots  using  either 
component  scores  (/,-)  (n  =  528)  resulting  from  all 
individual  spectra  or  their  means  per  individual  (/, )  (n 

=  66,  Table  1).  For  this  each  cluster  was  represented  by 
a  linear  regression  analysis  to  calculate  the  plane  of  best 
fit  for  both  tissue  density  clusters,  respectively,  defined 
by: 

t2  —  b  +  axtx  +ayty  (2.1) 

Where  tz  represents  one  component  randomly  chosen  as 
dependent  variable,  tx  and  ty  are  the  independent 

component  scores  using  either  U  or  t, ,  ax  and  ay  are  the 
resulting  slopes,  and  b  is  the  z-intercept. 

The  equation  of  the  plane  (/,)  separating  the  high  and 
low  tissue  density  training  sets  was  calculated  as  the 
plane  halfway  between  the  low  and  high  tissue  density 
planes  defined  by  equation  (2.1): 


^  j  2  Lv  low  d xlow^ xlow  & ylow^ ylow  ]+  V bhigh  +  ^ xhigh  ^  xhigh 

(2.2) 

Where  axhw,  ayhw  and  axhigh)  ayhigb  are  the  slopes  for  the 
low  tissue  density  cluster  and  the  high  tissue  density 
cluster,  respectively,  bhw  and  bhigh  are  the  respective  z- 
intercepts,  and  tx!ow,  ty!oW)  txhigh,  tyhigh  are  the  respective 
component  scores  again  using  either  tiorti  . 

Differences  between  component  scores  tj  to  t4  by 
measurement  position  (center,  medial,  distal  and  lateral) 
were  tested  by  non-parametric  methods  using  a 
Kruskal-Wallis  rank  test.  When  testing  for  differences 
between  measurement  positions,  only  measurements 
from  the  left  breast  were  used  since  we  showed  t,1]  no 
significant  difference  between  component  scores  from 
the  left  and  right  breast. 

PCA  derived  component  scores  tj  to  t4  for  the  low  and 
high  tissue  density  categories  as  a  function  of  a 
volunteer’s  age  and  body  mass  index  (BMI)  were 
examined  using  Spearman’s  r  correlation  analysis. 
Linear  regression  analysis  was  also  executed  for 
component  scores  tj  to  t4  as  a  function  of  age  or  BMI. 
For  correlation  and  linear  regression  analyses,  either  all 
scores  (n  =  1248)  or  scores  averaged  per  individual  (n  = 
156)  were  used. 

For  all  analyses,  p-values  <  0.05  were  considered  to  be 
statistically  significant. 

3.  RESULTS 

3.1  PCA  of  density  categories 

Figure  la  and  b  shows  examples  of  the  raw  data 
collected  for  women  with  high  and  low  tissue  density  , 
respectively.  Figure  lc  shows  the  resulting  first  four 
principle  components  (pi)  from  the  PCA  for  non- 
stratified  spectra  thus  using  n  =  936  spectra.  These  first 
four  components  contain  99.87%,  0.06%,  0.05%  and 
0.01%  of  the  variance  in  the  total  data  set,  respectively, 
yielding  a  combined  total  of  99.99%.  In  principle,  the 
data  can  be  represented  in  R4. 

A  R3  cluster  plot  of  component  scores  tj ,  t2  and  t3  from 
the  training  spectra  for  only  high  and  low  tissue 
densities  (n  =  528)  is  presented  in  Figure  2a.  A 

corresponding  plot  using  mean  component  scores  tt , 


t2  and  t3  per  individual  for  the  high  and  low  tissue 
densities  for  the  training  and  validation  sets  (n  =  88)  is 
shown  in  Figure  2b,  demonstrating  improved  cluster 
separation.  Thus,  spectra  related  to  breast  tissue 
classified  by  the  radiologist  as  either  or  low  density 
show  differences  in  their  spectral  compositions, 
resulting  in  reasonably  tight  clusters. 


Wavelength  (nm) 


Figure  1.  Examples  for  transillumination  spectrum  from 
women  with  a)  high  tissue  density,  b)  low  tissue  density  and 
c)  the  resulting  principle  components  p2  to  p4  of  thickness  and 
transfer  function  corrected  spectra,  all  given  as  a  function  of 
wavelength. 


3.2  PCA  of  density  categories  by  measurement 
position 

For  the  low  tissue  density  of  the  training  set,  a  Kruskal- 
Wallis  rank  test  demonstrated  a  significant  difference  in 

component  scores  tl  to  t4  between  each  measurement 
position  (all  at  p  <  0.01).  For  the  high  tissue  density 
training  set,  only  component  score  t2  resulted  in  a 

significant  difference  between  each  measurement 
position.  As  a  result,  PCA  was  repeated  separately  for 
each  measurement  position  (center,  medial,  distal  and 
lateral).  Spectra  from  both  the  left  and  right  breasts 
were  used  for  training,  e.g.  234  spectra  from  1 17 
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Figure  2.  Three-dimensional  cluster  plots  of  th  t2  and  t3  for  all 
component  scores  (a)  and  mean  scores  per  individual  (b) 
resulting  ffom  thickness  and  transfer  corrected  spectra  of  high 
and  low  tissue  density.  Open  symbols,  scores  ffom  tissue 
classified  as  low  density;  closed  symbols,  scores  from  tissue 
classified  as  high  density. 
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volunteers  and  78  spectra  from  39  volunteers  for 
validation. 

Figure  3  shows  the  first  four  principle  components  (pi) 
for  the  center  and  distal  positions  after  stratified  for 
measurement  position.  Of  the  variance  in  the  total  data 
set,  the  first  component  p}  contains  between  99.85% 
and  99.91%,  p2  between  0.05%  and  0.08%,  p3  between 
0.03%  and  0.05%,  and  p4  contains  0.01%.  Three- 
dimensional  cluster  plots  of  component  scores  tj,  t3  and 
t4  at  the  center  position  and  scores  tj ,  t2  and  t4  at  the 
distal  position  are  shown  in  Figure  4,  respectively,  for 
high  and  low  tissue  density  combined  data  sets  (n  = 
176). 


Wavelength  (nm) 

Figure  3.  Plots  of  principle  components  pj  to  p4  for  the  center 
(a)  and  distal  (b)  measurement  positions. 


3.3  HDM  and  LDM 

Table  2  shows  the  resulting  HDM  and  LDM  calculated 
using  scores  th  t2  and  t3  for  the  training  and  validation 
sets  using  either  all  component  scores  (training  =  528, 
validation  =  176)  or  the  mean  component  score  per 
individual  (training  =  66,  validation  =  22).  In  the 


majority  of  cases,  HDM  and  LDM  increase  when  the 
mean  component  score  t )  per  individual  is  used. 

Table  3  provides  the  best  HDM  and  LDM  results  for 
the  training  and  validation  sets  for  each  measurement 
position.  The  center  and  distal  positions  show  the  best 
HDM  and  LDM  for  the  training  and  validation  sets. 


Figure  4.  Three-dimensional  cluster  plot  of  tlt  t3  and  t4  and  th 
t2  and  t4  resulting  from  thickness  and  transfer  corrected 
spectra  from  high  and  low  density  tissue  for  the  center  (a)  and 
distal  (b)  positions.  Open  symbols  are  low  tissue  density; 
closed  symbols  are  high  tissue  density.  Circles  refer  to 
training  set  and  squares  to  validation  set. 
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Table  2.  HDM  and  LDM  for  test  and  validation  sets  using  all 
component  scores  and  mean  scores  per  individual. 


the  tissue  density,  no  age  or  BMI  correction  can  be 
introduced. 


Scores 

Used 

Training  Set 

[%] 

HDM  LDM 

Validation  Set 

[%] 

HDM  LDM 

All 

scores 

tiXti.t,) 

76.9 

88.3 

75.0 

96.7 

Mean 

score 

85.0 

89.1 

85.7 

100.0 

Table  3.  HDM  and  LDM  for  test  and  validation  sets  for  each 
measurement  position 


Position 

Scores 

Used 

Training  Set 
[%] 

HDM  LDM 

Validation  Set 
[%] 

HDM  LDM 

Center 

95.0 

87.0 

92.9 

90.0 

Distal 

90.0 

91.3 

100.0 

100.0 

Medial 

4) 

77.5 

71.7 

85.7 

86.7 

Lateral 

'3) 

80.0 

95.7 

71.4 

100.0 

3.4  Number  of  Measurement  Positions 

Figure  5  shows  histograms  indicating  the  frequency  of 
true  high  and  low  tissue  density  predictions  from  OTS 
for  each  individual.  If  scores  t,  from  3  or  more  of  four 
spectra  indicate  high  tissue  density,  the  best  HDM  and 
LDM  are  obtained  using  spectra  from  the  center  and 
distal  measurement  positions  (compare  Fig.  5a  to  5b). 
When  including  medium  tissue  density  for  the  centre 
and  distal  position  33%  would  be  classified  as  false 
positive,  see  figure  6.  While  this  appears  to  result  in  an 
overall  higher  false  positive  rate,  specifically  when 
considering  that  the  medium  density  comprises  half  the 
population,  one  also  needs  to  consider  that  plane  of 
separation  was  determined  only  based  on  the  total 
difference  between  high  and  low  clusters. 

3.5  PCA  and  Physical  Parameters  (Age  and  BMI) 

Linear  regression  analysis  demonstrated  a  significant 
correlation  between  the  component  scores  t,  and  age  or 
BMI.  However,  further  analysis  showed  that  the  age 
dependent  slopes  of  the  different  tissue  densities  are  in 
effect  parallel  and  hence  without  a  priori  knowledge  of 


Figure  5.  The  number  of  spectra  from  (a)  the  center  and  distal 
positions  to  predict  global  tissue  density  with  a  resulting 
LDM  =96.7%  and  HDM  96.3%  and  (b)  the  medial  and  lateral 
positions  with  a  resulting  LDM  =95.1%  and  HDM  70.4%  (b) 
that  correctly  predicted  high  tissue  density.  High  tissue 
density  shown  in  black,  low  tissue  density  shown  in  grey. 


Number  of  spectra  that  predicted  High  Density 


Figure  6.  The  number  of  spectra  from  the  center  and  distal 
positions  to  predict  global  tissue  density,  with  a  resulting 
LDM  =67.6%  and  HDM  96.3% 
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Scatter  plots  of  mean  component  scores  t}  and  t2  per 
individual  as  a  function  of  the  volunteer’s  age  and  body 
mass  index  for  the  high  and  low  tissue  densities  are 
shown  in  Figure  7  and  8  respecitvely.  The  results  of 
linear  regression  analysis  between  age  or  BMI  and  the 
first  two  component  scores  are  also  indicated. 
According  to  Spearman’s  r,  a  significant  but  weak 
correlation  exists  between  both  component  scores  t2 

and  t2  and  age  or  BMI.  Similar  results  were  obtained 
for  component  scores  t3  and  t4  (data  not  shown). 
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Figure  7.  Scatter  plots  of  averaged  component  scores  t2  (top) 

and  t2  (bottom)  per  individual  as  a  function  of  age  for  high 

(closed  symbols  and  dashed  line)  and  low  (open  circles  and 
solid  regression  line)  tissue  density  categories. 
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Figure  8.  Scatter  plots  of  averaged  component  scores  t2  (top) 

and  t2  (bottom)  per  individual  as  a  function  of  BMI  for  high 

(closed  symbols  and  dashed  line)  and  low  (open  circles  and 
solid  regression  line)  tissue  density  categories. 

4.  DISCUSSION 

4.1  Principle  Component  Analysis  (all  measurement 
positions) 

Input  parameters  such  as  known  tissue  chromophore 
spectra  were  not  implemented  in  the  training  of  our 
PCA  model.  Despite  this  fact,  each  derived  principle 
component,  ph  includes  a  combination  of  spectral 
signatures  comprising  light  scattering  and  chromophore 
absorptions  (water,  lipid  and  hemoglobins),  which  vary 
with  breast  tissue  density. 


Scores  associated  with  principle  component  p3  contain 
information  on  wavelength  independent  absorption  due 


to  differential  optical  path  length,  and  light  losses  at  the 
breast  boundary  and  have  negative  values  in  the  PCA 
model  (Fig.  1).  Light  scattering  is  tissue  dependent  and 
its  wavelength  dependence  can  be  described  by 
u/=aA~b  Light  losses  due  to  scattering  occur 
because  of  an  increase  in  photon  path  length  and  thus  a 
lower  probability  of  traversing  tissue.  The  polyurethane 
standard,  also  a  Mie  scatterer,  with  an  albedo  >  0.999 
was  used  to  calibrate  the  spectral  transfer  function  of 
the  optical  system  daily  and  thus,  the  wavelength 
dependence  of  the  scattering  in  the  breast  tissue  is 
minimized  or  lost  tnl  Hence,  p2  carries  wavelength 
independent  optical  path  length  information  and  via  the 
change  in  optical  path  length  between  tissue  types  it  can 
contributes  to  determine  tissue  density.  For  example, 
low  density  tissue  has  reduced  scattering  when 
compared  to  high  density  tissue  tl7],  resulting  in  higher 
values  of  tj  (i.e.  less  negative),  indicating  less 
attenuation  or  loss  of  light  and  a  shorter  optical  path 
length. 

Light  losses  at  the  breast  boundary  are  also  captured  in 
Pj.  We  have  shown  previously  that  measurements  taken 
at  the  medial  position  are  different  from  the  centre 
measurement  and  similar  to  measurements  made  closer 
to  the  edge  of  the  breast  [u\  This  further  suggests  that 
stratification  by  position  is  beneficial. 

The  most  important  spectral  features  in  the  spectrum  of 
p2  are  the  lipid  peak  at  925  nm  and  the  inverse  water 
peak  at  975  nm  (Fig.  1).  Low  density  tissue  is 
characterized  by  adipose  tissue  resulting  in  positive 
scores  t2  (Fig.  2)  when  the  lipid  peak  is  the  dominant 
spectral  feature.  Smaller  contributions  in  the  low  tissue 
density  spectrum  are  also  evident  at  770  nm  (deoxy 
hemoglobin)  and  825  nm,  a  minor  lipid  absorption 
peak.  Conversely,  high  density  tissue  has  a  higher  water 
content [18^  and  the  scores  t2  are  predominantly  negative, 
when  the  water  peak  is  the  dominant  spectral  feature 
(Figs.  1,  2).  Contributions  from  deoxy  hemoglobin  are 
also  evident  in  the  high  tissue  density  spectrum  of  p2 
between  625  and  750  nm,  where  the  negative  slope  of 
the  deoxy  hemoglobin  curve  is  visible  (Fig.  1). 

The  spectrum  of  p3  is  relatively  flat  between  625  nm 
and  875  nm,  which  can  be  attributed  to  contributions 
from  the  oxy  hemoglobin  curve  (Fig.  1).  Another 
notable  feature  is  that  the  lipid  and  water  peaks  are 
positive  compared  to  the  spectra  of  p2  and  p4  (see 
below).  Component  scores  t3  for  low  density  tissue  are 
positive  when  the  lipid  peak  is  the  dominant  spectral 
feature  (Figs.  1,  2).  Component  scores  t3  for  high 
density  tissue  are  negative,  suggesting  a  shift  from  an 


oxy  hemoglobin  curve  to  a  deoxy  hemoglobin  curve  (as 
seen  in  the  spectrum  of  p2).  Since  decreases  in  oxygen 
are  associated  with  increased  cellular  metabolism  ^l9\ 
oxygen  saturation  is  smaller  in  high  density  tissue  when 
compared  to  low  density  tissue. 

The  spectrum  of  p4  is  similar  to  that  of  p2  (Fig.  1), 
however,  for  low  density  tissue,  component  scores  t4 
are  negative  and  for  high  density  tissue,  they  are 
positive  (Fig.  2). 

Cluster  plots  in  three  dimensional  space  defined  by  */,  t2 
and  t3  result  in  a  good  separation  between  high  and  low 
tissue  densities,  either  when  all  component  scores  are 
used  (tj)  or  when  the  mean  score  per  individual  is  used 

(/, )  (Figs.  2a  and  b).  The  improvement  in  HDM  and 

LDM  for  the  mean  component  scores  is  given  by  the 
fact  based  on  the  definition  of  high  and  low  density 
only  3  of  4  quadrants  scores  need  to  be  with  in  the 
cluster  based  on  the  mammographic  global 
classification.  Averaging  the  scores  reduces  the  effect 
of  measuring  possible  a  local  volume  not  expressing  the 
globally  assessed  density. 

4.2  Principle  Component  Analysis  by  measurement 
position 

Positional  analysis  suggests  that  the  PCA  model  should 
be  trained  independently  on  each  measurement  position 
(center,  medial,  distal,  and  lateral).  The  results  of  the 
PCA  for  each  individual  position  and  the  subsequent 
analysis  of  cluster  plots  in  three  dimensional  spaces 
show  that  only  two  positions  need  to  be  interrogated: 
the  center  and  the  distal.  In  general,  the  component 
spectra  pf  for  these  two  positions  are  comparable  with 
the  exception  that  component  spectrum  p2  for  the  center 
position  is  similar  to  component  spectrum  p3  for  the 
distal  position  and  vice  versa  (Fig.  3).  Both  component 
spectra  (p2  at  center  and  p3  at  distal)  are  spectrally  flat 
over  a  large  wavelength  region  with  poor  differentiation 
of  the  lipid  and  water  peaks.  Consequently,  cluster  plots 
in  a  three  dimensional  space  defined  by  th  t3  and  t4 
result  in  a  good  separation  between  high  and  low  tissue 
densities  for  the  center  position  and  cluster  plots  in  a 
three  dimensional  space  defined  by  th  t2  and  t4  result  in 
a  good  separation  between  high  and  low  tissue  densities 
for  the  distal  position  (Fig.  4).  The  exchange  of  the 
significance  between  t2  and  t3  for  the  center  and  distal 
positions  is  most  likely  limited  by  the  small  amount  of 
variance  captured  by  them  in  association  with  the 
limited  number  of  observations  available  to  date. 

4.3  HDM  and  LDM 
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Because  radiological  assigned  parenchymal  density  is  a 
global  analysis,  calculating  HDM  and  LDM  using  all 
eight  spectra  derived  component  scores  induces  obvious 
additional  variability  as  the  density  is  not  distributed 
homogeneously  throughout  the  tissue.  Using  the  mean 
score  per  individual  reduces  the  variability  in  the  OTS 
predicted  density  within  an  individual’s  breast  tissue 
(Fig.  2b;  Table  2). 

The  best  estimation  of  a  global  classification  is  pooling 
spectra  from  different  positions  (n  =  2).  The  highest 
HDM  and  LDM  is  achieved  using  spectra  from  the 
center  and  distal  positions.  (Fig.  5a).  The  HDM  and 
LDM  based  on  these  two  positions  might  be  a  slight 
overestimate  of  the  predictive  power  of  the  test  given 
that  women  who  demonstrated  variation  in  the  bilateral 
organ  were  excluded  from  analysis.  However,  it  also 
suggests  that  improved  PCA  training  is  possible  if  the 
regional  tissue  density  associated  with  each  optically 
interrogated  volume  is  known. 

The  medial  position  provides  acceptable  HDM  and 
LDM  for  our  validation  set  (Table  3).  The  lateral 
position  provides  LDM  comparable  to  the  distal 
position,  but  the  lowest  HDM  among  positions  for  our 
validation  set  (Table  3).  This  difference  in  the 
predictive  value  of  the  different  quadrants  is  a  direct 
reflection  of  the  spatial  prevalence  of  parenchymal 
density  pattern  within  the  breast [211. 

4.4  Component  scores  and  physical  parameters  (Age 
and  BMI) 

As  the  population  ages  one  would  expect  an  increase  in 
component  scores  tj  and  t2  since  atrophy  of  the  tissue, 
still  ongoing  past  menopause,  would  result  in  less  light 
scatter  and  an  increase  in  lipid  content  [20l  However, 
from  Figures  6a  and  b,  it  is  evident  that  there  is  a  small 
increase  in  component  scores  tj  with  age  for  both  low 
and  high  density  tissue,  and  little  to  no  increase  in  t2 
with  age  for  the  low  and  high  tissue  density  categories. 
So  as  a  predictive  tool  the  odds  ratio  or  relative  risk 
should  be  independent  of  age.  Similarly,  with  an 
increase  in  BMI  we  anticipate  component  scores  tj  and 
t2  to  increase,  A  small  but  significant  association  exists 
with  BMI  and  component  scores  tj  and  t2  for  the  low 
tissue  density  group  (Figs.  6c,  d).  A  cluster  of  scores  tj 
and  t2  related  to  the  high  density  tissue  is  seen  for  low 
BMI,  which  results  in  an  apparent  large  slope  for  this 
group;  however,  the  correlation  coefficients  are  weak. 


While  some  statistically  significant  correlations  can  be 
established  between  age  and  BMI  and  the  component 
scores  tj  and  t2i  these  associations  are  also  dependent  on 
a  woman’s  density  classification.  Consequently,  no 
correction  for  age  or  BMI  is  possible  since  the  density 
of  a  woman  coming  in  for  OTS  measurements  is  not 
known  a  priori.  This  further  suggests  that  the 
physiological  changes  in  breast  tissue  density  due  to 
age  and  BMI  are  already  contained  in  the  component 
spectra  and  hence  captured  by  the  derived  component 
scores. 

5.  CONCLUSION 

OTS  is  a  physical  assessment  technique  applicable  to 
the  population  at  risk  that  only  requires  information  on 
the  inter-optode  distance  and  the  location  of  the 
measurements  on  the  breast  in  cranial  caudal  projection 
(center,  medial,  distal,  and  lateral).  Consequently  the 
spectra  are  independent  of  the  instrument  and 
interrogated  tissue  thickness,  and  are  hence  portable 
between  instruments.  Furthermore,  only  two  positions 
need  to  be  interrogated,  the  center  and  the  distal,  to 
produce  sensitivity  and  specificity  values  above  96%. 
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Abstract 


Preventive  oncology  is  in  need  of  a  risk  assessment  technique  that  can  identify 
individuals  at  high  risk  for  breast  cancer  and  that  has  the  ability  to  monitor  the  efficacy  of  a  risk 
reducing  intervention.  Optical  transillumination  spectroscopy  (OTS)  gives  information  about 
breast  tissue  composition  and  tissue  density.  OTS  is  non-invasive  and  in  contrast  to 
mammography,  uses  non-ionising  radiation.  It  is  safe  and  can  be  used  frequently  on  younger 
women,  potentially  permitting  early  risk  detection  and  thus  increasing  the  time  available  for  risk 
reduction  interventions  to  assert  their  influence.  Before  OTS  can  be  used  as  a  risk  assessment 
and/or  monitoring  technique,  its  predictive  ability  needs  to  be  demonstrated  and  maximized 
through  the  construction  of  various  mathematical  models  relating  OTS  and  an  established  breast 
cancer  risk  factor.  Here  we  selected  parenchymal  density  pattern  as  risk  predicting  standard. 

To  establish  a  correlation  between  OTS  and  parenchymal  density  pattern  Principle 
Components  Analysis  (PCA),  using  risk  classifications,  is  executed.  The  PCA  scores  from  156 
volunteers  are  presented  in  three-dimensional  cluster  plots  and  a  plane  of  differentiation  that 
separates  the  high  and  low  tissue  densities  is  used  to  calculate  the  predictive  value.  Stratification 
of  PCA  for  measurement  position  on  the  breast  in  cranial-caudal  projection  is  introduced. 
Analysis  of  PCA  scores  as  a  function  of  the  volunteer’s  age  and  body  mass  index  (BMI)  is 
examined,  as  options  for  further  subject  stratification. 

A  small  but  significant  correlation  between  the  component  scores  and  age  or  BMI  is 
noted  but  the  correlation  is  dependent  on  the  tissue  density  category  examined.  Correction  of  the 
component  scores  for  age  and  BMI  is  not  recommended,  since  a  priori  knowledge  of  a  women’s 
breast  tissue  density  is  required.  Stratification  for  the  center  and  distal  measurement  positions 
provide  a  predictive  value  for  OTS  above  96%. 
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Introduction 


Preventive  oncology  involves  identification  of  the  population  at  risk  and  the 
implementation  and  monitoring  of  intervention  strategies.  The  first  task  requires  methods  and 
techniques  based  on  physical  measurements  or  demographic  information  that  identify  individuals 
at  high  risk  who  would  benefit  most  from  the  interventions.  These  risk  quantification  techniques 
must  be  applicable  to  the  entire  population  and  permit  identification  of  individuals  at  high  risk 
for  developing  cancer  with  high  sensitivity  and  specificity.  The  technique  should  exploid  a  risk 
identifier  providing  a  high  relative  risk  or  high  odds  ratio  so  all  individuals  at  risk  are  identified 
while  decrease  the  number  of  individuals  undergoing  unnecessary  treatment  (i.e.  those  who  are 
low  risk).  Thus,  individuals  are  provided  with  the  information  required  to  empowering  them  to 
make  informed  decisions  regarding  their  health  and  the  potential  benefits  of  risk  reduction 
strategies.  While  genetic  predisposition  towards  certain  cancers  is  known,  only  a  small 
percentage  of  the  population  is  affected,  such  as  women  with  BRCA1  and  BRCA2  genetic 
mutations'  who  comprise  only  a  fraction  of  one  percent  of  the  general  population  and  hence  this 
information  is  not  suitable  for  population-wide  screening. 

Breast  cancer  is  the  most  commonly  occurring  cancer  in  women.  The  lifetime  risk  for 
developing  breast  cancer  is  1  in  5-10,  depending  on  the  reporting  agency2.  The  tissue 
transformation  preceding  breast  cancer  can  occur  20  years  prior  to  the  development  of  the 
disease  thereby  providing  a  “window  of  opportunity”  to  employ  risk  reduction  interventions 
such  as  modification  in  lifestyle,  diet,  chemopreventive  treatments  (i.e.  Tamoxifen,  Raloxifene) 4 
or  in  severe  cases,  prophylactic  mastectomy5.  While  all  of  these  interventions  are  designed  to 
prevent  cancer,  they  will  inadvertently  influence  the  quality  of  a  woman’s  life,  specifically  those 
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interventions  designed  to  achieve  a  reduction  in  risk  in  only  a  few  months  or  years.  Conversely, 
identifying  women  at  risk  at  an  earlier  age  may  only  require  relatively  small  lifestyle  changes, 
which,  acting  over  a  longer  period  of  time,  may  also  achieve  an  adequate  reduction  in  risk.  To 
achieve  this  goal,  a  technique  capable  of  detecting  women  at  high  risk  for  breast  cancer  at  an 
earlier  age  is  required.  Employing  a  physical  method  of  assessing  the  breast  cancer  risk  may  also 
prove  useful  in  monitoring  the  efficacy  of  risk  reducing  interventions,  as  the  intervention  may 
change  the  physical  properties,  whereas  the  demographic  information  about  an  individual  is  not 
altered  due  to  the  intervention. 

Current  methods  of  establishing  breast  cancer  risk  include  the  Gail  Model6  and 
parenchymal  density  patterns  derived  from  standard  x-ray  mammography.  The  former  method  is 
primarily  based  on  demographic  information  while  the  latter  represents  a  physical  property  of 
tissue.  Parenchymal  density  patterns  reflect  the  ratio  of  glandular  and  connective  tissue  to 
adipose  tissue  within  the  breast.  Women  with  mammographically  dense  tissue  occurring  in  more 
than  70%  of  the  breast  are  4  to  6  times  more  likely  to  develop  breast  cancer  than  those  with 
density  showing  in  less  than  25%  of  the  area7.  Because  of  the  inherent  risks  of  x-rays, 
mammography  is  not  recommended  as  an  annual  diagnostic  for  women  younger  than  50  in  some 
countries  and  younger  than  40  in  others8,  thereby  limiting  the  time  available  for  risk  reduction 
interventions  to  assert  their  influence.  Similarly,  the  demographic  information  required  for 
inclusion  in  the  Gail  Model  is  often  not  available  until  women  have  reached  their  late  thirties  or 
forties.  Furthermore,  the  Gail  Model  is  only  valid  for  women  older  than  30  again  forgoing  useful 
intervention  years. 

Near-infrared  optical  transillumination  spectroscopy  (OTS)  has  been  shown  to  give 
information  about  breast  tissue  composition9.  Specifically,  OTS  results  in  unique  optical  spectra, 
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governed  by  hemoglobins,  water  and  lipid  absorption  within  the  tissue  and  the  average  light 
scattering  power10.  The  differences  in  chromophore  contributions  in  the  tissue  have  also  been 
shown  to  mirror  the  x-ray  dense  and  x-ray  lucent  areas  of  the  mammogram11.  In  contrast  to 
mammography,  which  is  based  on  ionizing  radiation  probing  predominantly  the  atomic 
composition  of  the  tissue,  OTS  uses  non-ionising  radiation  with  quantum  energies  that  interact 
with  the  electronic  and  vibrational  levels  of  the  molecules,  representing  more  the  anatomic, 
metabolic  status  of  the  breast.  Consequently,  OTS  can  be  used  more  frequently  and  on  younger 
women  and  has  the  capability  of  detecting  differences  in  tissue  composition  between  high  and 
low  risk  groups,  based  on  molecular  composition. 

In  an  ongoing  feasibility  study  using  mammographic  breast  density  classified  on  a 
nominal  scale  as  an  interim  indicator  of  risk,  our  group  has  shown  that  OTS  can  identify  women 
with  high  breast  tissue  density  with  a  predictive  value  above  85%.  However,  before  OTS  can  be 
implemented  as  a  tool  for  risk  estimation  and/or  as  a  monitoring  technique  during  risk  reducing 
interventions,  it  is  necessary  to  optimize  the  predictive  ability  or  the  relative  risk  provided  by 
OTS,  thus  minimizing  the  number  of  patients  given  an  incorrect  risk  assessment.  By  establishing 
a  relative  risk  for  OTS  similar  to  that  of  mammography  through  the  use  of  non-ionizing  radiation 
a  gain  two  decades  for  risk  reduction  interventions  to  exert  their  benefit  is  possible. 

The  present  investigation  is  an  extension  of  earlier  studies  performed  by  our  group  with 
the  main  purpose  to  improve  the  predictive  power  of  OTS  by  identifying  required  stratifications 
on  a  larger  number  of  volunteers,  prior  to  employing  our  analysis  techniques.  Principle 
Components  Analysis  (PC A)  using  risk  classification  by  a  radiologist  as  the  ‘gold  standard’  is 
employed  to  establish  a  correlation  between  OTS  and  mammographic  density  pattern. 
Multivariate  analysis  techniques  are  used  to  determine  if  stratification  of  the  spectral  data  by 
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measurement  position  on  the  breast  in  cranial-caudal  projection  (center,  medial,  distal,  or  lateral) 
or  physical  parameters  such  as  the  volunteer’s  age  and  body  mass  index  (BMI)  is  beneficial  in 
increasing  the  tissue  density  prediction  by  OTS. 

Materials  and  Methods 
Patient  recruitment 

The  data  set  used  in  this  study  includes  a  collection  of  mammograms  and  spectral 
information  gathered  from  156  volunteers.  All  volunteers  were  recruited  through  the  Marvelle 
Koffler  Breast  Centre  at  Mount  Sinai  Hospital,  Toronto,  Ontario.  All  women  had  a  film 
screening  mammogram  within  the  last  12  months  of  being  recruited,  which  was  negative  for 
radiological  suspicious  lesions.  All  women  also  had  no  previous  surgery  to  the  breast  tissue, 
including  reduction  or  implants.  Volunteer  recruitment  for  this  study  was  approved  by  the  IRBs 
of  the  University  of  Toronto  and  the  University  Health  Network.  Informed  consent  was  received 
from  all  volunteers  prior  to  OTS. 

Film  mammograms  were  classified  on  a  nominal  scale  by  a  radiologist  (Dr.  Roberta  Jong, 
Women’s  College  and  Sunnybrook  Health  Science  Centre,  Toronto,  Ontario)  into  either  low  (< 
25  %),  medium  (25%  to  75%)  or  high  (>  75%  dense  tissue  area)  density  categories.  Women  who 
displayed  bilateral  variations  on  their  mammograms  were  not  included  in  this  analysis.  Sixty-one 
women  were  classified  by  the  radiologist  as  having  low,  68  as  having  medium,  and  27  as  having 
high  tissue  density.  The  population  contributions  of  the  tissue  density  categories  in  this  study 
closely  reflect  the  population  proportions  seen  in  the  Canadian  National  Breast  Screening 
Study13  (Table  1).  The  age  of  the  volunteers  ranged  from  36  to  72  years. 


Table  1 
before 
next 
section 
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Visible  near-infrared  transmission 

The  instrumentation  used  to  gather  transillumination  spectra  was  described  previously  in 
detail12.  Briefly,  a  12W  halogen  lamp  served  as  broadband  light  source,  ultra-violet  and  part  of 
the  visible  spectrum  and  mid-infrared  radiation  were  eliminated  using  a  cut-on  filter  (A >550 
nm)  and  a  heat  rejection  filter,  respectively.  The  remaining  light  was  coupled  into  a  5  mm 
diameter  liquid  light  guide  [Fiberguide  Industries,  Bridgewater,  NJ]  placed  in  contact  with  the 
skin  on  top  of  the  breast  tissue.  A  total  power  of  0.25  W,  covering  the  550  to  1300  nm  bandwidth, 
was  delivered  to  the  skin.  Transmitted  light  was  collected  via  a  7  mm  diameter  optical  fiber 
bundle  [P  &  P  Optica,  Kitchener,  Waterloo,  140  fibres,  200  pm,  N.A.  0.36].  Wavelength 
dependent  detection  in  the  visible  and  near-infrared  was  achieved  using  a  spectro  photometer 
(Kaiser,  California,  USA)  equipped  with  holographic  transillumination  grating  (15.7rules/mm 
blazed  at  850nm)  and  a  2D  cryogenically  cooled  silicon  CCD  (Photometries,  New  Jersey,  USA) 
at  a  spectral  resolution  of  better  than  3  nm  between  625  and  1060  nm.  Spectral  resolution  was 
achieved  by  positioning  a  0.5  mm  slit  between  the  distal  end  of  the  collection  fibre  and  the 
spectral  grating.  The  peak  quantum  efficiency  of  the  back  thin  CCD  is  at  780  nm  with  a  quantum 
efficiency  of  0.2  remaining  at  1060  nm.  By  imaging  the  entrance  slit  of  the  spectrograph  onto  a 
2D  CCD,  50  rows  of  pixels  were  exposed  to  detected  light  thereby  increasing  the  dynamic  range 
of  the  electronic  detection  by  a  factor  of  >  30.  Cryogenic  cooling  was  used  to  minimize 
background  noise.  Further  signal  to  noise  improvement  was  accomplished  by  using  exposure 
times  of  3  to  5  seconds,  and  averaging  5  scans  for  all  spectra.  Hospital  Grade  Canada  Standards 
Association  (CSA)  certification  was  obtained  for  use  of  the  instrumentation  on  volunteers. 
Health  Canada  regulation  (IEV  825  equivalent)14  for  the  maximum  permissible  exposure  for 
radiation  sources  at  non-therapeutic  doses  was  not  exceeded. 
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All  volunteer  measurements  were  taken  in  the  dark,  with  the  volunteer  seated 
comfortably  in  an  upright  position  and  the  breast  resting  on  a  horizontal  platform.  A  total  of 
eight  measurements  in  cranial-caudal  projection  were  taken,  four  per  breast  (center,  medial, 
distal,  and  lateral,  see  Fig.  1,  they  are  rotated  by  about  45°  relative  to  standard  radiological 
quadrants).  A  computer  allowed  for  system  control  and  data  display.  Typical  data  acquisition  for 
all  8  measurements  averaged  160  to  200  seconds,  each.  The  source  and  detector  fibers  (optodes) 
were  held  coaxially,  pointing  towards  each  other,  by  a  caliper  attached  to  the  resting  platform 
providing  the  interoptode  distance.  The  source  fibre  was  placed  against  the  skin  on  the  top 
surface  of  the  breast  with  minimal  compression.  Considering  typical  tissue  optical  properties15 
and  a  tissue  thickness  of  5  cm,  an  ovoid  shape  volume  estimated  at  30  cm3  is  interrogated.  In  this 
study  tissue  thickness  ranged  from  2.5  to  8  cm,  equivalent  to  12  or  54  cm  . 

All  spectra  were  corrected  for  daily  variations  in  the  wavelength  dependent  signal 
transfer  function  of  the  optical  system  and  the  thickness  of  the  interrogated  tissue,  such  that  all 
spectra  are  independent  of  the  instrument  and  the  interoptode  distance.  This  is  achieved  by 
referencing  all  spectra  to  a  transmission  standard  made  of  1  cm  thick  ultra  high  density 
polyurethane  (Gigahertz  Optics,  Munich,  Germany).  Consequently,  all  volunteer  spectra  used  in 
this  study  are  expressed  in  units  of  optical  density  per  centimeter  [OD  cm'1],  given  by  the 
negative  log  of  the  raw  data  spectrum  divided  by  the  reference  spectrum  of  the  polyurethane 
block  plus  the  optical  attenuation  of  the  polyurethane  block  and  divided  by  the  interoptode 
distance.  The  scattering  and  absorption  properties  of  the  standard  were  measured  in  a  separate 
experiment  using  an  integrating  sphere  diffuse  reflectance  set-up16  (OD  ~  1.8  to  2.3  over  the 
wavelength  range  of  interest). 


Figure  1 
before  next 
paragraph 
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Data  Analysis 

Principle  Components  Analysis  (PCA)  was  used  to  establish  a  correlation  between  the 
obtained  optical  transillumination  spectra  and  mammographic  density17.  PCA  is  an  analytic 
mathematical  method  optimized  for  comparison  of  vectors  (i.e.  optical  spectra)  with  nominal 
data  (i.e.  tissue  density).  PCA  relies  upon  an  eigenvector  decomposition  of  the  covariance  or 
correlation  matrix  of  the  data  matrix  ( m  x  n )  comprised  of  a  training  data  set  of  spectra  ( m  =  936) 
and  the  monitored  spectral  range  (n  =  436  wavelengths).  PCA  decomposes  this  data  matrix  as  the 
sum  of  the  outer  product  of  vectors  U  and  p,  plus  a  residual  matrix.  The  elements  of  the  U  vectors, 
called  scores  (i.e.  scalar  coefficients),  contain  information  on  how  the  samples  (i.e.  spectra) 
relate  to  each  other;  the  /^vectors,  or  components,  are  the  eigenvectors  of  the  covariance  matrix 
and  show  how  the  selected  variables  (i.e.  wavelengths)  relate  to  each  other.  It  is  noteworthy  that 
the  component  vectors  /?,are  orthogonal  to  one  another. 

Every  individual  spectrum  can  be  approximated  as  a  linear  combination  of  the  principle 
component  spectra  (pi)  where  each  component  is  weighted  by  the  scalar  coefficient  or  score  (6) 
for  that  individual  spectrum.  It  is  generally  found  that  the  data  can  be  described  in  fewer 
components  than  original  variables  («)  and  still  capture  >  99.9%  of  the  total  variance.  The  first 
component  tends  to  capture  the  greatest  amount  of  variation  in  the  data;  each  subsequent 
component  captures  the  greatest  possible  variance  remaining.  The  scores  of  two  or  more 
components  can  also  be  plotted  against  one  another  as  they  are  based  on  orthogonal  vectors.  To 
identify  spectra  that  are  closely  related  and  that  exhibit  common  traits,  clusters  within  those  2  or 
higher  dimensional  plots  are  analyzed.  Clusters  are  assigned  an  outcome,  here  low  versus  high 
tissue  density,  and  lines  or  planes  separating  the  clusters  are  determined  analytically.  Scores  that 
enable  differentiation  between  tissue  densities  identify  useful  component  spectra  and  hence 
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specific  chromophore  contributions  to  various  tissue  density  and  thus  indirectly  to  risk.  For  a 
more  detailed  description  of  PCA  and  the  mathematical  models  employed,  the  reader  is  referred 
to  Simick  (2001)12  and  Wise  (2000)17. 

In  this  study,  PCA  was  executed  on  75%  of  all  spectra  (n  =  936,  117  volunteers  x  8 
measurements)  randomly  selected  from  each  of  the  defined  tissue  density  categories  comprising 
the  training  set.  In  this  manner,  the  relative  contribution  of  each  tissue  density  category  in  the 
data  sets  was  retained  during  the  analysis.  The  remaining  25%  of  the  spectra  (n  =  312,  39 
volunteers  x  8  measurements)  were  placed  in  a  validation  set  (Table  1).  The  principle 
components  pi  derived  from  the  training  set  spectra  were  then  used  to  predict  the  scores  (*,•)  on 
the  validation  set  spectra,  thereby  testing  the  predictive  ability  of  the  model.  As  previously 
described,  the  data  was  only  corrected  for  the  spectral  transfer  function  of  the  optical  system  and 
the  thickness  of  the  interrogated  tissue.  The  data  sets  were  not  stratified  by  menopausal  status 
(i.e.  pre  versus  post-menopausal)  since  a  previous  study  by  our  group12  demonstrated  no 
influence  of  the  menstrual  cycle  on  the  spectral  measurements. 

Furthermore,  as  optical  transillumination  spectroscopy  (OTS)  is  proposed  as  a  physical 
method  for  risk  assessment  applicable  to  the  entire  population,  this  analysis  focuses  only  on 
variables  associated  with  physical  parameters,  more  specifically  measurement  position, 
volunteer’s  age,  and  volunteer’s  body  mass  index  (BMI),  although  information  on  other  risk 
factors,  such  as  family  history  of  breast  cancer,  parity  and  ethnicity,  was  collected  for  each 
volunteer.  All  statistical  analyses  were  carried  out  using  SPSS  11.0  (Statistical  Package  for  the 
Social  Sciences,  SPSS  Inc.,  USA). 

To  determine  the  predictive  value  of  the  PCA  model  and  hence  OTS,  two  measures, 
comparable  to  sensitivity  and  specificity  were  determined:  high  density  measure  (HDM)  and  low 
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density  measure  (LDM).  HDM  and  LDM  were  determined  for  both  the  training  and  validation 
sets  using  the  following  equations: 

TP  TN 

HDM  =  — — —  LDM  =  — ^ - 

TP  +  FN  TN  +  FP 

Where  true  positive  (TP)  is  the  number  of  spectra  representing  high  tissue  density  located  below 
the  separation  plane  in  a  three-dimensional  cluster  plot,  defined  below,  and  false  negative  (FN) 
is  the  number  of  spectra  representing  high  tissue  density  situated  above  the  separation  plane. 
Conversely,  true  negative  (TN),  is  the  number  of  spectra  representing  low  tissue  density  located 
above  the  separation  plane  and  false  positive  (FP)  is  the  number  of  spectra  representing  low 
tissue  density  situated  below  the  separation  plane. 

HDM  and  LDM  were  estimated  by  determining  the  separation  plane  that  differentiated 
the  high  and  low  tissue  density  clusters  resulting  from  the  training  set  scores  in  three- 
dimensional  plots  using  either  all  component  scores  (/;)  (n  =  528,  all  8  measurements  per 
volunteer)  or  mean  scores  per  individual  (tt )  (n  =  66,  Table  1).  Linear  regression  analysis  was 

used  to  first  calculate  the  plane  of  best  fit  for  both  the  high  and  low  tissue  density  clusters, 
respectively,  defined  both  by: 

tz=b  +  axtx+ayty  (2.1) 

selecting  the  scores  of  one  component  randomly  as  the  dependent  variable  z,  and  where  b  is  the 
z-intercept,  ax  and  ay  are  the  slopes  for  the  two  independent  component  scores,  tx  and  ty  using 

either  all  or  the  mean  scores  {U  or  tt ),. 

The  equation  (*/c„y)  of  the  plane  separating  the  high  and  low  tissue  density  training  sets 
was  calculated  as  the  plane  halfway  between  the  two  low  and  high  tissue  density  planes: 
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(2.2) 


* icrit  2  ^ovv  axlowtxlow  QylovJylow  )  \ Phigh  ®xhigh  ^ xhigh  “**  ^ yhigh  ^ yhigh 

Where  a*/ovv,  a^,,,  and  o^/g/,,  ayhiSh  are  the  slopes  for  the  low  tissue  density  cluster  and  the  high 
tissue  density  cluster,  respectively,  biow  and  bhtgh  are  the  respective  z-intercepts,  and  txiow,  tyiow, 
txhigh,  tyhigh  are  the  respective  component  scores  used  (f,-  or  tt ). 

Differences  between  component  scores  ti  to  ^  by  density  classification  and  by 
measurement  position  (center,  medial,  distal  and  lateral)  were  tested  by  non-  parametric  methods 
using  either  the  Mann- Whitney  U  test  or  a  Kruskal-Wallis  rank  test.  Non-parametric  testing  was 
warranted  since  the  component  scores  for  the  majority  of  cases  are  not  normally  distributed. 

When  testing  for  differences  between  measurement  positions  only,  measurements  from  the  left 
breast  were  used  since  we  showed  previously12  no  significant  difference  between  component 
scores  from  the  left  and  right  breast  when  excluding  women  showing  variability  in  the  tissue 
density. 

PCA  derived  component  scores  tj  to  for  the  low  and  high  tissue  density  categories  as  a 
function  of  a  volunteer’s  age  and  body  mass  index  (BMI)  were  examined  using  Spearman’s  r 
correlation  analysis.  Linear  regression  analysis  was  also  executed  for  component  scores  tj  to  as 
a  function  of  age  or  BMI.  For  correlation  and  linear  regression  analyses,  either  all  scores  (n  = 

1248)  or  scores  averaged  per  individual  (n  =  156)  were  used. 

For  all  analyses,  p-values  <  0.05  were  considered  to  be  statistically  significant. 

Results 

PCA  of  density  categories  all  measurement  positions 

Figure  2  shows  the  resulting  first  four  principle  components  (pi)  from  the  PCA  not 
stratified  for  measurement  position  and  thus  based  on  n  =  936  spectra.  These  first  four 

Figure  2 
after  first 
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components  contain  99.87%,  0.06%,  0.05%  and  0.01%  of  the  variance  in  the  total  data  set, 
respectively,  yielding  a  combined  total  of  99.99%. 

Box  and  whisker  plots  of  the  first  four  component  scores  (/,-)  for  each  density  category  for 

the  training  and  validation  sets  are  presented  in  Figure  3.  A  Kruskal-Wallis  rank  test 

demonstrated  that  the  scores  for  the  first  four  components  tj  to  tj  are  each  significantly  different 

between  the  low,  medium  and  high  tissue  densities  for  both  the  training  and  validation  data  sets 

(all  at  p  <  0.01).  The  Mann-Whitney  U  test  demonstrated  that  the  scores  for  the  first  four 

components  tj  to  tj  for  the  training  set  differed  significantly  between  all  permutations  of  density 

classification  at  p  <  0.01.  However,  for  the  validation  set,  only  three  scores  per  permutation 

showed  significance  at  p  <  0.01:  tj,  t2  and  ^  between  low  and  medium  tissue  density,  tj,  t2  and  t$ 

between  low  and  high  tissue  density,  and  fc,  t3  and  ti  between  medium  and  high  tissue  density.  before  3 

paragraph 

An  example  of  a  three-dimensional  cluster  plot  of  component  scores  th  t2  and  t3  from  the 
training  spectra  using  only  information  from  woment  with  either  high  or  low  tissue  densities  (n  = 

528)  is  presented  in  Figure  4.  Discrimination  of  the  high  and  low  tissue  densities  is  achieved 
across  a  three-dimensional  plane  of  separation  (not  shown)  analytically  derived  from  linear 
regression  analysis.  A  similar  3D  plot  using  mean  component  scores  tj ,  t2  and  t3  per 

individual  for  those  with  either  high  or  low  tissue  densities  for  the  training  and  validation  sets  (n 

=  88)  is  shown  in  Figure  5,  demonstrating  improved  cluster  separation.  Figures  4 

r  and  5  after 

paragraph 

PCA  of  density  categories  by  measurement  position 

Figure  6  displays  mean  component  scores  to  t3  in  the  training  set  for  each  of  the  four 
left  measurement  positions  for  the  high  and  low  tissue  densities  resulting  from  the  position 
independent  PCA.  For  the  low  tissue  density  group  in  the  training  set,  a  Kruskal-Wallis  rank  test 
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demonstrated  a  significant  difference  in  component  scores  t,  to  t3  between  each  measurement 


position  (all  at  p  <  0.01).  For  the  high  tissue  density  group,  only  component  score  t2  resulted  in 
a  significant  difference  between  each  measurement  position.  There  is  no  difference  in  component 
score  tj  between  positions  (p  =  0.055)  at  a  power  >  0.6.  For  component  scores  t3  ,  there  is  no 

difference  between  measurement  positions  (p  =  0.095)  at  a  power  >  0.9. 

Because  of  the  significant  differences  observed  PCA  was  repeated  separately  for  each 
measurement  position  (center,  medial,  distal  and  lateral).  Spectra  from  both  the  left  and  right 
breasts  were  used  for  training.  For  each  position,  234  spectra  from  117  volunteers  were  used  to 
train  the  models,  and  78  spectra  from  39  volunteers  were  used  to  validate  them.  Box  plots,  three- 
dimensional  cluster  plots  and  results  of  non-parametric  tests  are  presented  for  the  center  and 
distal  measurement  positions  only,  since  these  positions  provided  the  best  HDM  and  LDM  (see 
below). 

Figure  7  shows  the  first  four  principle  components  (pi)  resulting  from  PCA  stratified  for 
measurement  position.  Of  the  variance  in  the  total  data  set,  the  first  component  pi  contains 
between  99.85%  and  99.91%,  p2  between  0.05%  and  0.08%,  ps  between  0.03%  and  0.05%,  and 
P4  contains  0.01% . 

Box  and  whisker  plots  of  the  first  four  component  scores  (t,)  for  each  density  category  for 
the  center  and  distal  positions  for  the  training  and  validation  sets  are  presented  in  Figures  8  and 
9.  For  the  center  position,  a  Mann- Whitney  U  test  demonstrated  a  significant  difference  between 
the  low  and  high  tissue  density  for  both  training  and  validation  sets  in  component  scores  tj  and  is 
(p  <  0.01).  For  component  scores  tt  and  ts  in  the  validation  set,  there  is  no  difference  between 
high  and  low  tissue  densities  (p  =  0.579  and  p  =  0.338,  respectively)  at  a  power  >  0.6  and  0.9, 
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Figure  7 
after 

paragraph 


Figures  8 


respectively.  For  the  distal  position,  a  Mann-Whitney  U  test  demonstrated  a  significant 
difference  in  component  scores  tj,  t2  and  (4  between  the  low  and  high  tissue  density  for  both 
training  and  validation  sets  (all  at  p  <  0.01).  For  component  score  in  the  validation  set,  there  is 
no  difference  between  high  and  low  tissue  densities  (p  =  0.860)  at  a  power  >  0.6. 

Three-dimensional  cluster  plots  of  component  scores  tj,  t3  and  (4  at  the  center  position 
and  scores  ti,  t2  and  t4  at  the  distal  position  are  shown  in  Figure  10  and  11  respectively  for  high 
and  low  tissue  density  combined  datasets  (n  =  176). 

HDM  and  LDM 

Table  2  shows  the  resulting  HDM  and  LDM  calculated  for  the  training  and  validation  sets 
using  either  all  component  scores  (training  =  528,  validation  =  176)  or  the  mean  component 
score  per  individual  (training  =  66,  validation  =  22)  without  stratification  for  measurement 
position.  The  best  HDM  and  LDM  values  for  both  data  sets  are  obtained  separating  the  two 
tissue  density  classes  when  t2  is  a  function  of  ti  and  tj.  In  the  majority  of  cases,  the  HDM  and 

LDM  increase  when  the  mean  component  score  it  per  individual  is  used. 

Table  3  provides  the  best  HDM  and  LDM  results  for  the  training  and  validation  sets  for 
each  measurement  position  using  either  or  as  the  dependent  variable.  The  center  and  distal 
positions  show  the  best  HDM  and  LDM  for  the  training  and  validation  sets. 

Number  of  Measurement  Positions 

Figure  12  shows  histograms  indicating  the  frequency  of  true  high  or  low  tissue  density 
predictions  through  OTS  for  each  individual.  From  these  figures  it  is  apparent  that  if  scores  ti 
from  3  or  more  of  four  spectra  indicate  high  tissue  density,  the  best  HDM  and  LDM  are  obtained 
using  spectra  from  the  center  and  distal  measurement  positions  only  (Fig.  12a).  Spectra  from  the 
medial  and  lateral  positions  conversely  result  in  the  lowest  HDM  and  LDM  values  (Fig.  12b). 


Figures  10 
and  11 
after 

paragraph 


Tables  2 
and  3  after 
paragraph 


Figure  12 
after 

paragraph 
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PCA  and  Physical  Parameters  (Age  and  BMI) 

Linear  regression  analysis  demonstrated  a  significant  correlation  between  the  component 
scores  f,  and  age  or  BMI.  However,  further  analysis  showed  that  the  age  dependent  slopes  of  the 
different  tissue  densities  are  in  effect  parallel  and  hence  without  a  priori  knowledge  of  the  tissue 
density,  no  age  or  BMI  correction  can  be  introduced. 

A  scatter  plot  of  mean  component  scores  tl  and  t2  per  individual  as  a  function  of  the 
volunteer’s  age  for  the  high  and  low  tissue  densities  is  shown  in  Figure  13.  Figure  14  is  a  similar 
scatter  plot  of  the  mean  component  scores  t2  and  t2  for  each  individual  versus  their  BMI.  The 
results  of  linear  regression  analysis  between  age  or  BMI  and  the  first  two  component  scores  are 
also  indicated.  According  to  Spearman’s  r,  a  significant  but  weak  correlation  exists  between 


both  component  scores  t2  and  t2  and  age  or  BMI.  Similar  results  were  obtained  for  component 


scores  fc  and  f<( data  not  shown). 


Discussion 


Figures  13 
and  14 
after 

paragraph 


Principle  Component  Analysis  (all  measurement  positions) 

The  fact  that  significant  differences  between  component  scores  (/,-)  for  different 
permutations  of  density  classification  were  found  for  the  training  set  but  not  for  the  validation 
set,  specifically  for  scores  tj  between  medium  and  high  tissue  densities,  t$  between  low  and 
medium  tissue  densities,  and  between  low  and  high  tissue  densities  (Fig.  3),  suggests  possible 
overtraining  of  our  PCA  model.  The  fact  that  t4  showed  no  significant  difference  between  the 
low  and  high  tissue  densities  for  the  validation  set  indicates  that  the  residual  variance  of  ~  0.01% 
is  not  capable  of  differentiating  between  these  two  extreme  density  categories. 

Input  parameters,  such  as  known  tissue  chromophore  spectra,  were  not  implemented  in 
the  training  of  our  PCA  model.  Despite  this  fact,  each  derived  principle  component  (pi)  includes 
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a  combination  of  spectral  signatures  related  to  light  scattering  and  chromophore  absorptions 
(water,  lipid  and  hemoglobins),  which  vary  with  age  ref,  menopausal  status  ref,  pathology  ref  and 
breast  tissue  density ref. 

Scores  associated  with  principle  component  pi  can  be  interpreted  as  contain  information 
on  wavelength  independent  light  loss,  due  to  increased  absorption  based  on  the  differential 
optical  path  length,  and  light  losses  at  the  breast  boundary.  The  scores  have  negative  values  in 
the  PCA  model  (Fig.  2).  Light  scattering  and  hence,  the  differential  optical  path  length  is  tissue 
and  wavelength  dependent  and  can  be  described  by  us'=  aA~b  18.  Light  losses  due  to  scattering 

occur  because  of  an  increase  in  photon  path  length  and  thus  a  lower  probability  of  traversing 
tissue.  A  polyurethane  standard,  with  scattering  properties  similar  to  tissue  and  an  albedo  > 
0.999,  was  used  to  calibrate  the  spectral  transfer  function  of  the  optical  system  daily  and  by 
obtaining  the  ratio  of  the  tissue  transillumination  spectra  and  the  standard  spectrum  the 
wavelength  dependence  of  light  scattering  in  the  breast  tissue  is  minimized  or  lost  .  Hence,  pj 
carries  wavelength  independent  optical  path  length  information  but  contributes  information  to 
determine  tissue  density.  For  example,  low  density  tissue  has  reduced  scattering  when  compared 
to  high  density  tissue19,  resulting  in  higher  values  of  tj  (i.e.  less  negative),  indicating  less 
attenuation  or  loss  of  light  resulting  from  a  shorter  optical  path  length  (Fig.  3).  The  relationship 

15  19  20  21 

between  tissue  density  and  light  scattering  has  been  observed  previously  by  other  groups 
where  pre  menopausal  tissue  (mostly  high  density)  has  a  higher  scattering  coefficient  than  post 
menopausal  tissue  (mostly  low  density). 

Light  losses  at  the  breast  boundary  are  also  captured  in  pi.  The  difference  in  component 
scores  t\  observed  between  positions  (Fig.  6)  can  be  explained  by  light  losses  where  the  breast 
boundary  is  parallel  to  the  optical  axis  between  optodes  (medial,  lateral,  and  distal  positions).  We 
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have  shown  previously  that  measurements  taken  at  the  medial  position  are  different  from  the 
centre  measurement  and  similar  to  measurements  made  closer  to  the  edge  of  the  breast12.  The 
larger  tj  values  at  the  center  position  indicate  minimal  light  losses  due  to  boundary  conditions. 
Light  losses  are  highest  at  the  distal  position,  where  the  shift  towards  smaller  scores  suggests 
more  transmission  attenuation  due  to  light  losses.  These  observations  suggest  that  stratification 
as  a  function  of  position  is  beneficial. 

The  most  important  spectral  features  in  the  spectrum  of  p  2  are  the  lipid  peak  around  925 
nm  and  the  inverse  water  peak  around  975  nm  (Fig.  2).  Low  density  tissue  is  characterized  by 
adipose  tissue  and  positive  scores  *2  resulting  in  lipid  peak  being  the  dominant  spectral  feature 
(Fig.  3).  Smaller  contributions  in  the  low  tissue  density  spectrum  are  also  evident  at  770  nm 
(deoxy  hemoglobin)  and  825  nm,  a  minor  lipid  absorption  peak.  Conversely,  high  density  tissue 
has  a  large  water  content22  and  the  scores  ^  are  predominantly  negative,  resulting  in  the  water 
peak  being  the  dominant  spectral  feature  (Fig.3).  Contributions  from  deoxy  hemoglobin  are  also 
evident  in  the  high  tissue  density  spectrum  of  P2  between  625  and  750  nm,  where  the  negative 
slope  of  the  deoxy  hemoglobin  curve  is  visible. 

The  spectrum  of P3  is  relatively  flat  between  625  nm  and  875  nm,  which  can  be  attributed 
to  contributions  from  the  oxy  hemoglobin  curve  (Fig.  2).  Another  notable  feature  is  that  lipid  and 
water  are  positive,  compared  to  the  spectra  of P2  and  P4  (see  below).  Component  scores  t$  for  low 
density  tissue  are  positive  when  the  lipid  peak  is  the  dominant  spectral  feature  (Fig.  3). 
Component  scores  t$  for  high  density  tissue  are  negative,  suggesting  a  shift  from  an  oxy 
hemoglobin  curve  to  a  deoxy  hemoglobin  curve  (as  seen  in  the  spectrum  of  pi).  Since  decreases 
in  oxygen  are  associated  with  increased  cellular  metabolism21,  oxygen  saturation  is  anticipated  to 
be  smaller  in  high  density  tissue  when  compared  to  low  density  tissue. 
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The  spectrum  of  P4  is  similar  to  that  of  p2  (Fig.  2),  however,  for  low  density  tissue, 
component  scores  ^  are  negative  and  for  high  density  tissue  they  are  positive  (Fig.  3).  The  lipid 
peak  at  925  nm  and  the  water  peak  at  975  nm  are  the  dominant  spectral  features  for  the  low  and 
high  density  tissue,  respectively.  Smaller  contributions  from  presumably  the  hemoglobin 
saturation  between  775  nm  and  875  nm  are  also  noticeable. 

Cluster  plots  in  three  dimensional  space  defined  by  tj,  t2  and  fj  result  in  a  good  separation 
between  high  and  low  tissue  densities,  either  when  all  component  scores  are  used  (<i)  or  when  the 

mean  score  per  individual  is  used  (tf)  (Figs.  4  and  5).  In  both  cluster  plots,  scores  for  low 

density  tissue  are  tightly  clustered  above  a  plane  of  separation,  whereas  scores  for  the  high 
density  tissue  are  more  spread  out  and  are  situated  below  a  plane  of  separation.  The  low  density 
cluster  is  tight  despite  the  fact  that  low  density  breasts  are  more  common  in  the  general  and  in 
our  study  population  (Table  1). 

The  results  of  positional  analysis  (Fig.  6)  suggest  that  the  PCA  model  should  be  trained 
independently  on  each  measurement  position  (center,  medial,  distal,  and  lateral).  For  instance, 
analysis  of  component  scores  6  and  tj  by  measurement  position  demonstrates  differences  in 
wavelength  dependent  attenuation  within  various  regions  of  the  breast.  The  smaller  values  for 
component  scores  at  the  distal  and  lateral  positions  indicate  greater  attenuation  by  water  at 
these  positions,  compared  to  the  center  and  medial  positions  for  both  density  groups.  This  water 
associated  increase  in  attenuation  can  be  explained  by  the  location  of  the  ducts  and  mammary 
glands,  respectively.  Component  scores  tj  suggest  a  decrease  in  attenuation  due  to  oxy 
hemoglobin  and  a  concomitant  increase  due  to  deoxy  hemoglobin  at  the  distal  and  lateral  regions 
of  the  breast.  Hence,  the  spatial  prominence  of  various  tissues  within  the  breast  is  well  reflected 
in  the  scores. 
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Principle  Component  Analysis  by  measurement  position 

The  results  of  the  PCA  for  each  individual  position  and  the  subsequent  analysis  of 
cluster  plots  in  three  dimensional  spaces  suggest  that  only  two  positions  need  to  be  interrogated: 
the  center  and  the  distal.  The  high  predictive  value  of  the  center  position  is  explained  by  the  fact 
that  parenchymal  densities  are  rarely  observed  in  this  area  of  the  breast  in  women  with  low  and 
medium  tissue  densities.  Conversely,  the  high  predictive  value  of  the  distal  position  results  from 
the  location  of  the  lactic  ducts. 

In  general,  the  component  spectra  /»,-  for  these  two  positions  are  comparable  with  the 
exception  that  component  spectrum  P2  for  the  center  position  is  similar  to  component  spectrum  ps 
for  the  distal  position  and  vice  versa  (Fig.  7).  Both  component  spectra  (p2  at  center  and  P3  at 
distal)  are  spectrally  flat  over  a  large  wavelength  region  with  poor  differentiation  of  the  lipid  and 
water  peaks.  Consequently,  cluster  plots  in  a  three  dimensional  space  defined  by  tj,  ti  and  ^ 
result  in  a  good  separation  between  high  and  low  tissue  densities  for  the  center  position  (Fig.  10) 
and  cluster  plots  in  a  three  dimensional  space  defined  by  ti,  t2  and  t4  result  in  a  good  separation 
between  high  and  low  tissue  densities  for  the  distal  position  (Fig.  11).  The  results  of  Mann- 
Whitney  U  tests  also  demonstrated  no  significant  difference  in  component  scores  between  low 
and  high  tissue  densities  (validation  set)  for  the  center  position  and  in  component  scores 
between  low  and  high  tissue  densities  (validation  set)  for  the  distal  position  (Figs.  8  and  9).  This 
exchange  of  the  significance  between  p2  and  ps  for  the  center  and  distal  positions  is  most  likely 
limited  by  the  small  amount  of  variance  captured  by  them  in  association  with  the  limited  number 
of  observations  available  to  date.  The  pi  for  the  medial  and  lateral  positions  show  similar  shapes 
with  the  lateral  being  more  similar  to  the  distal  and  the  medial  more  similar  to  the  center. 

HDM  and  LDM 
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Because  radiological  assigned  parenchymal  density  is  a  global  analysis,  calculating  HDM 
and  LDM  from  all  eight  spectra  derived  component  scores  induces  obvious  additional  variability 
as  the  density  is  not  distributed  homogeneously  throughout  the  tissue.  Using  the  mean  score  per 
individual  reduces  the  variability  in  the  OTS  predicted  density  within  an  individual’s  breast 
tissue  (Fig.  5).  This  results  in  improved  HDM  and  LDM  for  our  validation  set  when  mean  scores 

(ti )  defined  by  tj,  t2  and  t3  or  tj,  t2  and  t4  are  plotted  in  three  dimensional  space  (Table  2). 
However,  when  the  mean  scores  ( tt )  defined  by  ti,  t3  and  t4  or  t2,  t3  and  t4  are  plotted  in  three 
dimensional  space,  the  LDM  increases  but  the  HDM  decreases.  This  latter  observation  suggests 
that  the  lipid  to  water  ratio  captured  in  component  p2  is  better  at  differentiating  high  tissue 
density  than  is  the  oxy  to  deoxy  hemoglobin  ratio  captured  by  component  p3. 

The  best  estimation  of  a  global  classification  is  pooling  spectra  from  different  positions 
(n  =  2),  since  the  patchy  nature  of  the  parenchyma  is  best  reflected.  The  highest  HDM  and  LDM 
are  achieved  using  spectra  from  the  center  and  distal  positions,  such  that  if  three  or  more  of  four 
spectra  indicate  high  tissue  density,  resulting  in  both  density  measured  being  above  96%  (Figure 
12a).  The  HDM  and  LDM  based  on  these  two  positions  might  be  a  slight  overestimate  of  the 
predictive  power  of  the  test  given  that  women  who  demonstrated  variation  in  the  bilateral  organ 
were  excluded  from  analysis.  However,  it  also  suggests  that  improved  PCA  training  is  possible 
if  the  regional  tissue  density  associated  with  each  optically  interrogated  volume  is  known. 

The  medial  position  provides  acceptable  HDM  and  LDM  for  our  validation  set.  The 
lateral  position  provides  LDM  comparable  to  the  distal  position,  but  the  lowest  HDM  among 
positions  for  our  validation  set.  The  difference  in  the  predictive  value  of  the  different  quadrants 
is  a  direct  reflection  of  the  spatial  prevalence  of  the  parenchymal  density  pattern  within  the 
breast 2i. 
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The  HDM  and  LDM  values  obtained  in  our  analysis  are  limited  by  two  facts.  First,  only 
one  radiologist  (R.  Jong)  was  involved  in  the  reading  and  classification  of  film  mammograms. 
Second  and  more  important,  we  have  approximated  our  three  dimensional  clusters  by  a  linear 
function.  With  regard  to  this  point,  future  analysis  will  focus  on  improved  density  cluster 
analysis  of  our  component  scores  in  three  dimensional  space,  where  the  HDM  and  LDM  are 
likely  to  improve. 

The  HDM  and  LDM  achieved  by  OTS  are  comparable  to  both  mammography  and  other 
spectroscopic  techniques.  A  meta-analysis  of  published  literature  showed  that  the  accuracy  for 
mammography  alone,  for  all  ages  combined,  varied  from  83%  to  95%  in  sensitivity24.  Other 
spectroscopy  studies  (i.e.  optical  and  elastic  scattering  spectroscopy)  examining  the  accuracy  of 
these  techniques  to  diagnose  breast  cancer  have  achieved  sensitivities  between  58%  and  91%  and 
specificities  between  74%  and  93%  l1,25. 

Component  scores  and  physical  parameters  (Age  and  BMI) 

As  the  population  ages  one  would  expect  an  increase  in  component  scores  ti  and  ti  since 
atrophy  of  the  tissue  (i.e.  mammary  glands),  still  ongoing  past  menopause,  would  result  in  less 
light  scatter  and  an  increase  in  lipid  content  26.From  Figure  13  it  is  evident  that  there  is  but  a 
small  increase  in  component  scores  tj  with  age  for  both  low  and  high  density  tissue,  and  little  to 
no  increase  in  ^  with  age  for  the  low  and  high  tissue  density  categories.  This  suggests  that  using 
OTS  as  a  predictive  tool  for  breast  cancer  risk,  its  odds  ratio  or  relative  risk  should  be 
independent  of  age.  Similarly,  with  an  increase  in  BMI  one  anticipates  component  scores  tj  and 
t2  to  increase  and  a  small,  but  significant,  association  exists  between  BMI  and  component  scores 
ti  and  t2  for  the  low  tissue  density  group  (Fig.  14).  A  cluster  of  scores  tj  and  related  to  the  high 
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density  tissue  is  seen  for  low  BMI,  which  results  in  an  apparent  large  slope  for  this  group; 
however,  the  correlation  coefficients  are  weak. 

While  some  statistically  significant  correlations  can  be  established  between  age  or  BMI 
and  the  component  scores  tj  or  <2,  these  associations  are  also  dependent  on  a  woman’s  density 
classification.  Consequently,  no  correction  for  age  or  BMI  is  possible  since  the  density  of  a 
woman  undergoing  OTS  measurements  is  not  known  a  priori.  This  suggests  that  the 
physiological  changes  in  breast  tissue  density  due  to  age  and  BMI  are  already  contained  in  the 
component  spectra  and  hence  captured  by  the  derived  component  scores. 

Conclusion 

OTS  is  a  physical  assessment  technique  applicable  to  the  entire  population  that  only 
requires  information  on  the  inter-optode  distance  and  the  measurements  position  on  the  breast  in 
cranial  caudal  projection  (center,  medial,  distal,  and  lateral).  Thus  the  spectra  are  independent  of 
the  instrument  and  interrogated  tissue  thickness,  and  are  hence  portable  between  instruments. 
Furthermore,  only  two  positions  need  to  be  interrogated,  the  center  and  the  distal,  to  produce 
HDM  and  LDM  values  above  96%.  Future,  work  should  focus  on  establishing  a  direct 
correlation  with  risk  as  not  to  become  dependent  on  the  limited  odds  ratio  given  by  the  currently 
chosen  intermediate  standard,  breast  tissue  density. 
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Tables 


Table  1 .  Breakdown  of  study  volunteers:  including  study  and  population  proportions  and 
total  number  of  spectra  analysed  (in  parentheses). 


Density 

Category 

Training 

Set 

Validation 

Set 

Total 

Study 

Proportion 

Population 

Proportion 

Low 

46 

15 

61 

39.0% 

37.0% 

(368) 

(120) 

(488) 

Medium 

51 

17 

68 

43.6% 

49.0% 

(408) 

(136) 

(544) 

High 

20 

7 

27 

17.3% 

14.0% 

(160) 

(56) 

(216) 

Totals 

117 

39 

156 

(936) 

(312) 

(1248) 
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Table  2.  HDM  and  LDM  for  test  and  validation  sets  using  all  component  scores  and  mean  scores 

per  individual. 


Equation  Used 

Training  Set 

Validation  Set 

HDM 

LDM 

HDM 

LDM 

All  scores 

76.9% 

88.3% 

75.0% 

96.7% 

hfOi.tt) 

69.4% 

87.5% 

64.3% 

92.5% 

*,/(/„  t4) 

48.1% 

81.0% 

58.9% 

80.8% 

>0 

56.9% 

84.5% 

60.7% 

86.7% 

Mean  scores 

85.0% 

89.1% 

85.7% 

100.0% 

85.0% 

82.6% 

71.4% 

93.3% 

50.0% 

87.0% 

42.9% 

86.7% 

60.0% 

89.1% 

57.1% 

93.3% 

29 


Table  3.  HDM  and  LDM  for  test  and  validation  sets  for  each  measurement  position. 


Position 

Equation  Used 

Training  Set 

Validation  Set 

HDM 

LDM 

HDM 

LDM 

Center 

*3/o»o 

95.0% 

87.0% 

92.9% 

90.0% 

Distal 

f2/0,,O 

90.0% 

91.3% 

100.0% 

100.0% 

Medial 

77.5% 

71.7% 

85.7% 

86.7% 

Lateral 

80.0% 

95.7% 

71.4% 

100.0% 
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Figure  Captions 

Figure  1.  Location  of  the  transillumination  measurements  in  the  cranial  caudal  projection  on  a 
standardized  volunteer.  Set  up  shows  center  position.  Other  positions  include  medial  (M),  distal 
(D),  and  lateral  (L). 

Figure  2.  Plot  of  principle  components  pj  to  p4  of  thickness  and  transfer  function  corrected 
spectra.  (Should  we  include  examples  of  spectra  from  high  and  low  tissue  density?) 

Figure  3.  Box  plots  of  component  scores  tj,  t2,  t3  and  t4  for  low,  medium  and  high  tissue  density; 
training  spectra  are  left  of  the  center  line,  validation  spectra  are  to  the  right.  Circles  (>  2c?)  and 
asterisks  (>  3 ex)  are  outliers. 

Figure  4.  Three-dimensional  cluster  plot  of  tj,  t2  and  resulting  from  thickness  and  transfer 
corrected  spectra  of  high  and  low  tissue  density.  Only  training  data  set  shown  (n  =  528).  Open 
circles,  spectra  from  tissue  classified  as  low  density;  closed  circles,  spectra  from  tissue  classified 
as  high  density. 

Figure  5.  Three-dimensional  cluster  plot  of  mean  scores  t1 ,  t2  and  t3  resulting  from  thickness 

and  transfer  corrected  spectra  of  high  and  low  tissue  density  (n  =  88).  Open  circles  and  squares, 
spectra  from  tissue  classified  as  low  density,  training  and  validation  set,  respectively;  closed 
circles  and  squares,  spectra  from  tissue  classified  as  high  density,  training  and  validation  set, 
respectively. 

Figure  6.  Mean  component  scores,  tj  (a),  t2  (b),  and  t3  (c)  for  the  four  left  measurement 

positions  (centre  =  LC,  medial  =  LM,  distal  =  LD,  lateral  =  LL)  for  the  low  (open  circles)  and 
high  (closed  circles)  density  tissue.  Error  bars  represent  95%  confidence  intervals  of  the  mean. 
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Figure  7.  Plots  of  principle  components  pi  to  p4  for  each  measurement  position  a.  center,  b. 
medial,  c.  distal,  and  d.  lateral. 

Figure  8.  Box  plots  of  component  scores  tj,  t2,  t3  and  t4  for  low,  medium  and  high  density  tissue 
for  the  centre  position;  training  spectra  are  left  of  the  center  line,  validation  spectra  are  to  the 
right.  Circles  (>  2a)  and  asterisks  (>  3a)  are  outliers. 

Figure  9.  Box  plots  of  component  scores  tj,  t2,  t3  and  t4  for  low,  medium  and  high  density  tissue 
for  the  distal  position;  training  spectra  left  of  the  center  line,  validation  spectra  are  to  the  right. 
Circles  (>  2a)  and  asterisks  (>  3a)  are  outliers. 

Figure  10.  Three-dimensional  cluster  plot  of  U,  and  t4  resulting  from  thickness  and  transfer 
corrected  spectra  from  high  and  low  density  tissue  for  the  center  position  (n  =  176).  Open 
symbols  are  low  tissue  density;  closed  symbols  are  high  tissue  density.  Circles  refer  to  training 
set  and  squares  to  validation  set. 

Figure  11.  Three-dimensional  cluster  plot  of  ti,  t2  and  t4  resulting  from  thickness  and  transfer 
corrected  spectra  from  high  and  low  density  tissue  for  the  distal  position  (n  =  176).  Open 
symbols  are  low  tissue  density;  closed  symbols  are  high  tissue  density.  Circles  refer  to  training 
set  and  squares  to  validation  set. 

Figure  12.  The  number  of  spectra  from  the  center  and  distal  positions  (a)  and  the  medial  and 
lateral  positions  (b)  that  correctly  predicted  high  tissue  density.  High  tissue  density  shown  in 
black,  low  tissue  density  shown  in  grey. 

Figure  13.  Scatter  plots  of  averaged  component  scores  t}  (a)  and  t2  (b)  per  individual  as  a 
function  of  age  for  high  and  low  density  categories  (n  =  88).  Open  circles  and  solid  regression 
line  represent  low  tissue  density,  closed  circles  and  dashed  line  represent  high  tissue  density. 
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Figure  14.  Scatter  plots  of  averaged  component  scores  t2  (a)  and  t2  (b)  per  individual  as  a 
function  of  BMI  for  high  and  low  density  categories  (n  =  88).  Open  circles  and  solid  regression 
line  represent  low  tissue  density;  closed  circles  and  dashed  line  represent  high  tissue  density. 
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Figure  5. 
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Figure  6. 
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Figure  7. 


38 


Attenuation  (OD/cm)  at  centre 


f 


"^s-00 


O.SO 


Uc 


°o0 


-O-So 


Figure  11. 
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